Skip to content
  1. Dec 22, 2016
  2. Dec 20, 2016
    • Chandler Carruth's avatar
      [PM] Provide an initial, minimal port of the inliner to the new pass manager. · 1d963114
      Chandler Carruth authored
      This doesn't implement *every* feature of the existing inliner, but
      tries to implement the most important ones for building a functional
      optimization pipeline and beginning to sort out bugs, regressions, and
      other problems.
      
      Notable, but intentional omissions:
      - No alloca merging support. Why? Because it isn't clear we want to do
        this at all. Active discussion and investigation is going on to remove
        it, so for simplicity I omitted it.
      - No support for trying to iterate on "internally" devirtualized calls.
        Why? Because it adds what I suspect is inappropriate coupling for
        little or no benefit. We will have an outer iteration system that
        tracks devirtualization including that from function passes and
        iterates already. We should improve that rather than approximate it
        here.
      - Optimization remarks. Why? Purely to make the patch smaller, no other
        reason at all.
      
      The last one I'll probably work on almost immediately. But I wanted to
      skip it in the initial patch to try to focus the change as much as
      possible as there is already a lot of code moving around and both of
      these *could* be skipped without really disrupting the core logic.
      
      A summary of the different things happening here:
      
      1) Adding the usual new PM class and rigging.
      
      2) Fixing minor underlying assumptions in the inline cost analysis or
         inline logic that don't generally hold in the new PM world.
      
      3) Adding the core pass logic which is in essence a loop over the calls
         in the nodes in the call graph. This is a bit duplicated from the old
         inliner, but only a handful of lines could realistically be shared.
         (I tried at first, and it really didn't help anything.) All told,
         this is only about 100 lines of code, and most of that is the
         mechanics of wiring up analyses from the new PM world.
      
      4) Updating the LazyCallGraph (in the new PM) based on the *newly
         inlined* calls and references. This is very minimal because we cannot
         form cycles.
      
      5) When inlining removes the last use of a function, eagerly nuking the
         body of the function so that any "one use remaining" inline cost
         heuristics are immediately refined, and queuing these functions to be
         completely deleted once inlining is complete and the call graph
         updated to reflect that they have become dead.
      
      6) After all the inlining for a particular function, updating the
         LazyCallGraph and the CGSCC pass manager to reflect the
         function-local simplifications that are done immediately and
         internally by the inline utilties. These are the exact same
         fundamental set of CG updates done by arbitrary function passes.
      
      7) Adding a bunch of test cases to specifically target CGSCC and other
         subtle aspects in the new PM world.
      
      Many thanks to the careful review from Easwaran and Sanjoy and others!
      
      Differential Revision: https://reviews.llvm.org/D24226
      
      llvm-svn: 290161
      1d963114
  3. Dec 19, 2016
  4. Dec 15, 2016
    • Hal Finkel's avatar
      Remove the AssumptionCache · 3ca4a6bc
      Hal Finkel authored
      After r289755, the AssumptionCache is no longer needed. Variables affected by
      assumptions are now found by using the new operand-bundle-based scheme. This
      new scheme is more computationally efficient, and also we need much less
      code...
      
      llvm-svn: 289756
      3ca4a6bc
  5. Nov 20, 2016
  6. Nov 04, 2016
  7. Oct 08, 2016
  8. Sep 28, 2016
  9. Sep 27, 2016
    • Adam Nemet's avatar
      [Inliner] Fold the analysis remark into the missed remark · 1142147e
      Adam Nemet authored
      There is really no reason for these to be separate.
      
      The vectorizer started this pretty bad tradition that the text of the
      missed remarks is pretty meaningless, i.e. vectorization failed.  There,
      you have to query analysis to get the full picture.
      
      I think we should just explain the reason for missing the optimization
      in the missed remark when possible.  Analysis remarks should provide
      information that the pass gathers regardless whether the optimization is
      passing or not.
      
      llvm-svn: 282542
      1142147e
    • Adam Nemet's avatar
      Output optimization remarks in YAML · a62b7e1a
      Adam Nemet authored
      (Re-committed after moving the template specialization under the yaml
      namespace.  GCC was complaining about this.)
      
      This allows various presentation of this data using an external tool.
      This was first recommended here[1].
      
      As an example, consider this module:
      
        1 int foo();
        2 int bar();
        3
        4 int baz() {
        5   return foo() + bar();
        6 }
      
      The inliner generates these missed-optimization remarks today (the
      hotness information is pulled from PGO):
      
        remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
        remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
      
      Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
      
        --- !Missed
        Pass:            inline
        Name:            NotInlined
        DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
        Function:        baz
        Hotness:         30
        Args:
          - Callee: foo
          - String:  will not be inlined into
          - Caller: baz
        ...
        --- !Missed
        Pass:            inline
        Name:            NotInlined
        DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
        Function:        baz
        Hotness:         30
        Args:
          - Callee: bar
          - String:  will not be inlined into
          - Caller: baz
        ...
      
      This is a summary of the high-level decisions:
      
      * There is a new streaming interface to emit optimization remarks.
      E.g. for the inliner remark above:
      
         ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                      DEBUG_TYPE, "NotInlined", &I)
                  << NV("Callee", Callee) << " will not be inlined into "
                  << NV("Caller", CS.getCaller()) << setIsVerbose());
      
      NV stands for named value and allows the YAML client to process a remark
      using its name (NotInlined) and the named arguments (Callee and Caller)
      without parsing the text of the message.
      
      Subsequent patches will update ORE users to use the new streaming API.
      
      * I am using YAML I/O for writing the YAML file.  YAML I/O requires you
      to specify reading and writing at once but reading is highly non-trivial
      for some of the more complex LLVM types.  Since it's not clear that we
      (ever) want to use LLVM to parse this YAML file, the code supports and
      asserts that we're writing only.
      
      On the other hand, I did experiment that the class hierarchy starting at
      DiagnosticInfoOptimizationBase can be mapped back from YAML generated
      here (see D24479).
      
      * The YAML stream is stored in the LLVM context.
      
      * In the example, we can probably further specify the IR value used,
      i.e. print "Function" rather than "Value".
      
      * As before hotness is computed in the analysis pass instead of
      DiganosticInfo.  This avoids the layering problem since BFI is in
      Analysis while DiagnosticInfo is in IR.
      
      [1] https://reviews.llvm.org/D19678#419445
      
      Differential Revision: https://reviews.llvm.org/D24587
      
      llvm-svn: 282539
      a62b7e1a
    • Adam Nemet's avatar
      Revert "Output optimization remarks in YAML" · cc2a3fa8
      Adam Nemet authored
      This reverts commit r282499.
      
      The GCC bots are failing
      
      llvm-svn: 282503
      cc2a3fa8
    • Adam Nemet's avatar
      Output optimization remarks in YAML · 92e928c1
      Adam Nemet authored
      This allows various presentation of this data using an external tool.
      This was first recommended here[1].
      
      As an example, consider this module:
      
        1 int foo();
        2 int bar();
        3
        4 int baz() {
        5   return foo() + bar();
        6 }
      
      The inliner generates these missed-optimization remarks today (the
      hotness information is pulled from PGO):
      
        remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
        remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
      
      Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
      
        --- !Missed
        Pass:            inline
        Name:            NotInlined
        DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
        Function:        baz
        Hotness:         30
        Args:
          - Callee: foo
          - String:  will not be inlined into
          - Caller: baz
        ...
        --- !Missed
        Pass:            inline
        Name:            NotInlined
        DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
        Function:        baz
        Hotness:         30
        Args:
          - Callee: bar
          - String:  will not be inlined into
          - Caller: baz
        ...
      
      This is a summary of the high-level decisions:
      
      * There is a new streaming interface to emit optimization remarks.
      E.g. for the inliner remark above:
      
         ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                      DEBUG_TYPE, "NotInlined", &I)
                  << NV("Callee", Callee) << " will not be inlined into "
                  << NV("Caller", CS.getCaller()) << setIsVerbose());
      
      NV stands for named value and allows the YAML client to process a remark
      using its name (NotInlined) and the named arguments (Callee and Caller)
      without parsing the text of the message.
      
      Subsequent patches will update ORE users to use the new streaming API.
      
      * I am using YAML I/O for writing the YAML file.  YAML I/O requires you
      to specify reading and writing at once but reading is highly non-trivial
      for some of the more complex LLVM types.  Since it's not clear that we
      (ever) want to use LLVM to parse this YAML file, the code supports and
      asserts that we're writing only.
      
      On the other hand, I did experiment that the class hierarchy starting at
      DiagnosticInfoOptimizationBase can be mapped back from YAML generated
      here (see D24479).
      
      * The YAML stream is stored in the LLVM context.
      
      * In the example, we can probably further specify the IR value used,
      i.e. print "Function" rather than "Value".
      
      * As before hotness is computed in the analysis pass instead of
      DiganosticInfo.  This avoids the layering problem since BFI is in
      Analysis while DiagnosticInfo is in IR.
      
      [1] https://reviews.llvm.org/D19678#419445
      
      Differential Revision: https://reviews.llvm.org/D24587
      
      llvm-svn: 282499
      92e928c1
  10. Aug 26, 2016
    • Adam Nemet's avatar
      [Inliner] Report when inlining fails because callee's def is unavailable · cef33141
      Adam Nemet authored
      Summary:
      This is obviously an interesting case because it may motivate code
      restructuring or LTO.
      
      Reporting this requires instantiation of ORE in the loop where the call
      sites are first gathered.  I've checked compile-time
      overhead *with* -Rpass-with-hotness and the worst slow-down was 6% in
      mcf and quickly tailing off.  As before without -Rpass-with-hotness
      there is no overhead.
      
      Because this could be a pretty noisy diagnostics, it is currently
      qualified as 'verbose'.  As of this patch, 'verbose' diagnostics are
      only emitted with -Rpass-with-hotness, i.e. when the output is expected
      to be filtered.
      
      Reviewers: eraman, chandlerc, davidxl, hfinkel
      
      Subscribers: tejohnson, Prazek, davide, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D23415
      
      llvm-svn: 279860
      cef33141
  11. Aug 17, 2016
    • Chandler Carruth's avatar
      [Inliner] Add a flag to disable manual alloca merging in the Inliner. · f702d8ec
      Chandler Carruth authored
      This is off for now while testing can take place to make sure that in
      fact we do sufficient stack coloring to fully obviate the manual alloca
      array merging.
      
      Some context on why we should be using stack coloring rather than
      merging allocas in this way:
      
      LLVM relies very heavily on analyzing pointers as coming from different
      allocas in order to make aliasing decisions. These are some of the most
      powerful aliasing signals available in LLVM. So merging allocas is an
      extremely destructive operation on the LLVM IR -- it takes away highly
      valuable and hard to reconstruct information.
      
      As a consequence, inlined functions which happen to have array allocas
      that this pattern matches will fail to be properly interleaved unless
      SROA manages to hoist everything to an SSA register. Instead, the
      inliner will have added an unnecessary dependence that one inlined
      function execute after the other because they will have been rewritten
      to refer to the same memory.
      
      All that said, folks will reasonably want some time to experiment here
      and make sure there are no significant regressions. A flag should give
      us an easy knob to test.
      
      For more context, see the thread here:
      http://lists.llvm.org/pipermail/llvm-dev/2016-July/103277.html
      http://lists.llvm.org/pipermail/llvm-dev/2016-August/103285.html
      
      Differential Revision: https://reviews.llvm.org/D23052
      
      llvm-svn: 278892
      f702d8ec
  12. Aug 10, 2016
    • Piotr Padlewski's avatar
      Changed sign of LastCallToStaticBouns · d89875ca
      Piotr Padlewski authored
      Summary:
      I think it is much better this way.
      When I firstly saw line:
        Cost += InlineConstants::LastCallToStaticBonus;
      I though that this is a bug, because everywhere where the cost is being reduced
      it is usuing -=.
      
      Reviewers: eraman, tejohnson, mehdi_amini
      
      Subscribers: llvm-commits, mehdi_amini
      
      Differential Revision: https://reviews.llvm.org/D23222
      
      llvm-svn: 278290
      d89875ca
    • Adam Nemet's avatar
      [Inliner,OptDiag] Add hotness attribute to opt diagnostics · 896c09bd
      Adam Nemet authored
      Summary:
      The inliner not being a function pass requires the work-around of
      generating the OptimizationRemarkEmitter and in turn BFI on demand.
      This will go away after the new PM is ready.
      
      BFI is only computed inside ORE if the user has requested hotness
      information for optimization diagnostitics (-pass-remark-with-hotness at
      the 'opt' level).  Thus there is no additional overhead without the
      flag.
      
      Reviewers: hfinkel, davidxl, eraman
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D22694
      
      llvm-svn: 278185
      896c09bd
  13. Aug 06, 2016
  14. Aug 03, 2016
  15. Jul 29, 2016
    • Piotr Padlewski's avatar
      Added ThinLTO inlining statistics · 84abc74f
      Piotr Padlewski authored
      Summary:
      copypasta doc of ImportedFunctionsInliningStatistics class
       \brief Calculate and dump ThinLTO specific inliner stats.
       The main statistics are:
       (1) Number of inlined imported functions,
       (2) Number of imported functions inlined into importing module (indirect),
       (3) Number of non imported functions inlined into importing module
       (indirect).
       The difference between first and the second is that first stat counts
       all performed inlines on imported functions, but the second one only the
       functions that have been eventually inlined to a function in the importing
       module (by a chain of inlines). Because llvm uses bottom-up inliner, it is
       possible to e.g. import function `A`, `B` and then inline `B` to `A`,
       and after this `A` might be too big to be inlined into some other function
       that calls it. It calculates this statistic by building graph, where
       the nodes are functions, and edges are performed inlines and then by marking
       the edges starting from not imported function.
      
       If `Verbose` is set to true, then it also dumps statistics
       per each inlined function, sorted by the greatest inlines count like
       - number of performed inlines
       - number of performed inlines to importing module
      
      Reviewers: eraman, tejohnson, mehdi_amini
      
      Subscribers: mehdi_amini, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D22491
      
      llvm-svn: 277089
      84abc74f
  16. Jul 23, 2016
  17. Jun 10, 2016
  18. May 23, 2016
  19. Apr 30, 2016
  20. Apr 29, 2016
  21. Apr 23, 2016
  22. Apr 22, 2016
  23. Apr 21, 2016
    • Andrew Kaylor's avatar
      Initial implementation of optimization bisect support. · f0f27929
      Andrew Kaylor authored
      This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.
      
      The bisection is enabled using a new command line option (-opt-bisect-limit).  Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit.  A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.
      
      The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check.  Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute.  A new function call has been added for module and SCC passes that behaves in a similar way.
      
      Differential Revision: http://reviews.llvm.org/D19172
      
      llvm-svn: 267022
      f0f27929
  24. Apr 18, 2016
    • Mehdi Amini's avatar
      [NFC] Header cleanup · b550cb17
      Mehdi Amini authored
      Removed some unused headers, replaced some headers with forward class declarations.
      
      Found using simple scripts like this one:
      clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'
      
      Patch by Eugene Kosov <claprix@yandex.ru>
      
      Differential Revision: http://reviews.llvm.org/D19219
      
      From: Mehdi Amini <mehdi.amini@apple.com>
      llvm-svn: 266595
      b550cb17
  25. Mar 08, 2016
  26. Mar 04, 2016
  27. Mar 03, 2016
    • Easwaran Raman's avatar
      Infrastructure for PGO enhancements in inliner · 3035719c
      Easwaran Raman authored
      This patch provides the following infrastructure for PGO enhancements in inliner:
      
      Enable the use of block level profile information in inliner
      Incremental update of block frequency information during inlining
      Update the function entry counts of callees when they get inlined into callers.
      
      Differential Revision: http://reviews.llvm.org/D16381
      
      llvm-svn: 262636
      3035719c
  28. Mar 02, 2016
    • Chandler Carruth's avatar
      [AA] Hoist the logic to reformulate various AA queries in terms of other · 12884f7f
      Chandler Carruth authored
      parts of the AA interface out of the base class of every single AA
      result object.
      
      Because this logic reformulates the query in terms of some other aspect
      of the API, it would easily cause O(n^2) query patterns in alias
      analysis. These could in turn be magnified further based on the number
      of call arguments, and then further based on the number of AA queries
      made for a particular call. This ended up causing problems for Rust that
      were actually noticable enough to get a bug (PR26564) and probably other
      places as well.
      
      When originally re-working the AA infrastructure, the desire was to
      regularize the pattern of refinement without losing any generality.
      While I think it was successful, that is clearly proving to be too
      costly. And the cost is needless: we gain no actual improvement for this
      generality of making a direct query to tbaa actually be able to
      re-use some other alias analysis's refinement logic for one of the other
      APIs, or some such. In short, this is entirely wasted work.
      
      To the extent possible, delegation to other API surfaces should be done
      at the aggregation layer so that we can avoid re-walking the
      aggregation. In fact, this significantly simplifies the logic as we no
      longer need to smuggle the aggregation layer into each alias analysis
      (or the TargetLibraryInfo into each alias analysis just so we can form
      argument memory locations!).
      
      However, we also have some delegation logic inside of BasicAA and some
      of it even makes sense. When the delegation logic is baking in specific
      knowledge of aliasing properties of the LLVM IR, as opposed to simply
      reformulating the query to utilize a different alias analysis interface
      entry point, it makes a lot of sense to restrict that logic to
      a different layer such as BasicAA. So one aspect of the delegation that
      was in every AA base class is that when we don't have operand bundles,
      we re-use function AA results as a fallback for callsite alias results.
      This relies on the IR properties of calls and functions w.r.t. aliasing,
      and so seems a better fit to BasicAA. I've lifted the logic up to that
      point where it seems to be a natural fit. This still does a bit of
      redundant work (we query function attributes twice, once via the
      callsite and once via the function AA query) but it is *exactly* twice
      here, no more.
      
      The end result is that all of the delegation logic is hoisted out of the
      base class and into either the aggregation layer when it is a pure
      retargeting to a different API surface, or into BasicAA when it relies
      on the IR's aliasing properties. This should fix the quadratic query
      pattern reported in PR26564, although I don't have a stand-alone test
      case to reproduce it.
      
      It also seems general goodness. Now the numerous AAs that don't need
      target library info don't carry it around and depend on it. I think
      I can even rip out the general access to the aggregation layer and only
      expose that in BasicAA as it is the only place where we re-query in that
      manner.
      
      However, this is a non-trivial change to the AA infrastructure so I want
      to get some additional eyes on this before it lands. Sadly, it can't
      wait long because we should really cherry pick this into 3.8 if we're
      going to go this route.
      
      Differential Revision: http://reviews.llvm.org/D17329
      
      llvm-svn: 262490
      12884f7f
  29. Feb 09, 2016
    • Sanjoy Das's avatar
      Add an "addUsedAAAnalyses" helper function · 1c481f50
      Sanjoy Das authored
      Summary:
      Passes that call `getAnalysisIfAvailable<T>` also need to call
      `addUsedIfAvailable<T>` in `getAnalysisUsage` to indicate to the
      legacy pass manager that it uses `T`.  This contract was being
      violated by passes that used `createLegacyPMAAResults`.  This change
      fixes this by exposing a helper in AliasAnalysis.h,
      `addUsedAAAnalyses`, that is complementary to createLegacyPMAAResults
      and does the right thing when called from `getAnalysisUsage`.
      
      Reviewers: chandlerc
      
      Subscribers: mcrosier, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D17010
      
      llvm-svn: 260183
      1c481f50
  30. Jan 15, 2016
  31. Dec 28, 2015
  32. Dec 23, 2015
    • Akira Hatanaka's avatar
      Provide a way to specify inliner's attribute compatibility and merging. · 1cb242eb
      Akira Hatanaka authored
      This reapplies r256277 with two changes:
      
      - In emitFnAttrCompatCheck, change FuncName's type to std::string to fix
        a use-after-free bug.
      - Remove an unnecessary install-local target in lib/IR/Makefile. 
      
      Original commit message for r252949:
      
      Provide a way to specify inliner's attribute compatibility and merging
      rules using table-gen. NFC.
      
      This commit adds new classes CompatRule and MergeRule to Attributes.td,
      which are used to generate code to check attribute compatibility and
      merge attributes of the caller and callee.
      
      rdar://problem/19836465
      
      llvm-svn: 256304
      1cb242eb
  33. Dec 22, 2015
Loading