Skip to content
  1. Jan 30, 2017
  2. Jan 24, 2017
    • Chandler Carruth's avatar
      [PH] Replace uses of AssertingVH from members of analysis results with · 6acdca78
      Chandler Carruth authored
      a lazy-asserting PoisoningVH.
      
      AssertVH is fundamentally incompatible with cache-invalidation of
      analysis results. The invaliadtion happens after the AssertingVH has
      already fired. Instead, use a PoisoningVH that will assert if the
      dangling handle is ever used rather than merely be assigned or
      destroyed.
      
      This patch also removes all of the (numerous) doomed attempts to work
      around this fundamental incompatibility. It is a pretty significant
      simplification IMO.
      
      The most interesting change is in the Inliner where we still do some
      clearing because we don't want to rely on the coarse grained
      invalidation strategy of the containing pass manager. However, I prefer
      the approach that contains this logic to the cleanup phase of the
      Inliner, and I think we could enhance the CGSCC analysis management
      layer to make this even better in the future if desired.
      
      The rest is straight cleanup.
      
      I've also added a test for one of the harder cases to work around: when
      a *module analysis* contains many AssertingVHes pointing at functions.
      
      Differential Revision: https://reviews.llvm.org/D29006
      
      llvm-svn: 292928
      6acdca78
  3. Jan 23, 2017
  4. Jan 22, 2017
    • Chandler Carruth's avatar
      [PM] Fix a really nasty bug introduced when adding PGO support to the · b698d596
      Chandler Carruth authored
      new PM's inliner.
      
      The bug happens when we refine an SCC after having computed a proxy for
      the FunctionAnalysisManager, and then proceed to compute fresh analyses
      for functions in the *new* SCC using the manager provided by the old
      SCC's proxy. *And* when we manage to mutate a function in this new SCC
      in a way that invalidates those analyses. This can be... challenging to
      reproduce.
      
      I've managed to contrive a set of functions that trigger this and added
      a test case, but it is a bit brittle. I've directly checked that the
      passes run in the expected ways to help avoid the test just becoming
      silently irrelevant.
      
      This gets the new PM back to passing the LLVM test suite after the PGO
      improvements landed.
      
      llvm-svn: 292757
      b698d596
    • Chandler Carruth's avatar
      [PM] Add some debug logging to the new PM inliner to make it easier to · d4be9f4b
      Chandler Carruth authored
      trace its behavior.
      
      llvm-svn: 292756
      d4be9f4b
  5. Jan 20, 2017
    • Easwaran Raman's avatar
      Improve PGO support for the new inliner · 12585b01
      Easwaran Raman authored
      This adds the following to the new PM based inliner in PGO mode:
      
      * Use block frequency analysis to derive callsite's profile count and use
      that to adjust thresholds of hot and cold callsites.
      
      * Incrementally update the BFI of the caller after a callee gets inlined
      into it. This incremental update is only within an invocation of the run
      method - BFI is not preserved across calls to run.
      Update the function entry count of the callee after inlining it into a
      caller.
      
      * I've tuned the thresholds for the hot and cold callsites using a hacked
      up version of the old inliner that explicitly computes BFI on a set of
      internal benchmarks and spec. Once the new PM based pipeline stabilizes
      (IIRC Chandler mentioned there are known issues) I'll benchmark this
      again and adjust the thresholds if required.
      Inliner PGO support.
      
      Differential revision: https://reviews.llvm.org/D28331
      
      llvm-svn: 292666
      12585b01
  6. Dec 28, 2016
    • Chandler Carruth's avatar
      [PM] Teach the inliner's call graph update to handle inserting new edges · 9900d18b
      Chandler Carruth authored
      when they are call edges at the leaf but may (transitively) be reached
      via ref edges.
      
      It turns out there is a simple rule: insert everything as a ref edge
      which is a safe conservative default. Then we let the existing update
      logic handle promoting some of those to call edges.
      
      Note that it would be fairly cheap to make these call edges right away
      if that is desirable by testing whether there is some existing call path
      from the source to the target. It just seemed like slightly more
      complexity in this code path that isn't strictly necessary. If anyone
      feels strongly about handling this differently I'm happy to change it.
      
      llvm-svn: 290649
      9900d18b
  7. Dec 27, 2016
    • Chandler Carruth's avatar
      [PM] Add one of the features left out of the initial inliner patch: · 141bf5d1
      Chandler Carruth authored
      skipping indirectly recursive inline chains.
      
      To do this, we implicitly build an inline stack for each callsite and
      check prior to inlining that doing so would not form a cycle. This uses
      the exact same technique and even shares some code with the legacy PM
      inliner.
      
      This solution remains deeply unsatisfying to me because it means we
      cannot actually iterate the inliner externally. Doing so would not be
      able to easily detect and avoid such cycles. Some day I would very much
      like to have a solution that works without this internal state to detect
      cycles, but this is not that day.
      
      llvm-svn: 290590
      141bf5d1
    • Chandler Carruth's avatar
      [PM] Teach the inliner in the new PM to merge attributes after inlining. · 03130d98
      Chandler Carruth authored
      Also enable the new PM in the attributes test case which caught this
      issue.
      
      llvm-svn: 290572
      03130d98
    • Chandler Carruth's avatar
      [PM] Teach the always inliner in the new pass manager to support · 6e9bb7e0
      Chandler Carruth authored
      removing fully-dead comdats without removing dead entries in comdats
      with live members.
      
      This factors the core logic out of the current inliner's internals to
      a reusable utility and leverages that in both places. The factored out
      code should also be (minorly) more efficient in cases where we have very
      few dead functions or dead comdats to consider.
      
      I've added a test case to cover this behavior of the always inliner.
      This is the last significant bug in the new PM's always inliner I've
      found (so far).
      
      llvm-svn: 290557
      6e9bb7e0
  8. Dec 22, 2016
  9. Dec 20, 2016
    • Chandler Carruth's avatar
      [PM] Provide an initial, minimal port of the inliner to the new pass manager. · 1d963114
      Chandler Carruth authored
      This doesn't implement *every* feature of the existing inliner, but
      tries to implement the most important ones for building a functional
      optimization pipeline and beginning to sort out bugs, regressions, and
      other problems.
      
      Notable, but intentional omissions:
      - No alloca merging support. Why? Because it isn't clear we want to do
        this at all. Active discussion and investigation is going on to remove
        it, so for simplicity I omitted it.
      - No support for trying to iterate on "internally" devirtualized calls.
        Why? Because it adds what I suspect is inappropriate coupling for
        little or no benefit. We will have an outer iteration system that
        tracks devirtualization including that from function passes and
        iterates already. We should improve that rather than approximate it
        here.
      - Optimization remarks. Why? Purely to make the patch smaller, no other
        reason at all.
      
      The last one I'll probably work on almost immediately. But I wanted to
      skip it in the initial patch to try to focus the change as much as
      possible as there is already a lot of code moving around and both of
      these *could* be skipped without really disrupting the core logic.
      
      A summary of the different things happening here:
      
      1) Adding the usual new PM class and rigging.
      
      2) Fixing minor underlying assumptions in the inline cost analysis or
         inline logic that don't generally hold in the new PM world.
      
      3) Adding the core pass logic which is in essence a loop over the calls
         in the nodes in the call graph. This is a bit duplicated from the old
         inliner, but only a handful of lines could realistically be shared.
         (I tried at first, and it really didn't help anything.) All told,
         this is only about 100 lines of code, and most of that is the
         mechanics of wiring up analyses from the new PM world.
      
      4) Updating the LazyCallGraph (in the new PM) based on the *newly
         inlined* calls and references. This is very minimal because we cannot
         form cycles.
      
      5) When inlining removes the last use of a function, eagerly nuking the
         body of the function so that any "one use remaining" inline cost
         heuristics are immediately refined, and queuing these functions to be
         completely deleted once inlining is complete and the call graph
         updated to reflect that they have become dead.
      
      6) After all the inlining for a particular function, updating the
         LazyCallGraph and the CGSCC pass manager to reflect the
         function-local simplifications that are done immediately and
         internally by the inline utilties. These are the exact same
         fundamental set of CG updates done by arbitrary function passes.
      
      7) Adding a bunch of test cases to specifically target CGSCC and other
         subtle aspects in the new PM world.
      
      Many thanks to the careful review from Easwaran and Sanjoy and others!
      
      Differential Revision: https://reviews.llvm.org/D24226
      
      llvm-svn: 290161
      1d963114
  10. Dec 19, 2016
  11. Dec 15, 2016
    • Hal Finkel's avatar
      Remove the AssumptionCache · 3ca4a6bc
      Hal Finkel authored
      After r289755, the AssumptionCache is no longer needed. Variables affected by
      assumptions are now found by using the new operand-bundle-based scheme. This
      new scheme is more computationally efficient, and also we need much less
      code...
      
      llvm-svn: 289756
      3ca4a6bc
  12. Nov 20, 2016
  13. Nov 04, 2016
  14. Oct 08, 2016
  15. Sep 28, 2016
  16. Sep 27, 2016
    • Adam Nemet's avatar
      [Inliner] Fold the analysis remark into the missed remark · 1142147e
      Adam Nemet authored
      There is really no reason for these to be separate.
      
      The vectorizer started this pretty bad tradition that the text of the
      missed remarks is pretty meaningless, i.e. vectorization failed.  There,
      you have to query analysis to get the full picture.
      
      I think we should just explain the reason for missing the optimization
      in the missed remark when possible.  Analysis remarks should provide
      information that the pass gathers regardless whether the optimization is
      passing or not.
      
      llvm-svn: 282542
      1142147e
    • Adam Nemet's avatar
      Output optimization remarks in YAML · a62b7e1a
      Adam Nemet authored
      (Re-committed after moving the template specialization under the yaml
      namespace.  GCC was complaining about this.)
      
      This allows various presentation of this data using an external tool.
      This was first recommended here[1].
      
      As an example, consider this module:
      
        1 int foo();
        2 int bar();
        3
        4 int baz() {
        5   return foo() + bar();
        6 }
      
      The inliner generates these missed-optimization remarks today (the
      hotness information is pulled from PGO):
      
        remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
        remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
      
      Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
      
        --- !Missed
        Pass:            inline
        Name:            NotInlined
        DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
        Function:        baz
        Hotness:         30
        Args:
          - Callee: foo
          - String:  will not be inlined into
          - Caller: baz
        ...
        --- !Missed
        Pass:            inline
        Name:            NotInlined
        DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
        Function:        baz
        Hotness:         30
        Args:
          - Callee: bar
          - String:  will not be inlined into
          - Caller: baz
        ...
      
      This is a summary of the high-level decisions:
      
      * There is a new streaming interface to emit optimization remarks.
      E.g. for the inliner remark above:
      
         ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                      DEBUG_TYPE, "NotInlined", &I)
                  << NV("Callee", Callee) << " will not be inlined into "
                  << NV("Caller", CS.getCaller()) << setIsVerbose());
      
      NV stands for named value and allows the YAML client to process a remark
      using its name (NotInlined) and the named arguments (Callee and Caller)
      without parsing the text of the message.
      
      Subsequent patches will update ORE users to use the new streaming API.
      
      * I am using YAML I/O for writing the YAML file.  YAML I/O requires you
      to specify reading and writing at once but reading is highly non-trivial
      for some of the more complex LLVM types.  Since it's not clear that we
      (ever) want to use LLVM to parse this YAML file, the code supports and
      asserts that we're writing only.
      
      On the other hand, I did experiment that the class hierarchy starting at
      DiagnosticInfoOptimizationBase can be mapped back from YAML generated
      here (see D24479).
      
      * The YAML stream is stored in the LLVM context.
      
      * In the example, we can probably further specify the IR value used,
      i.e. print "Function" rather than "Value".
      
      * As before hotness is computed in the analysis pass instead of
      DiganosticInfo.  This avoids the layering problem since BFI is in
      Analysis while DiagnosticInfo is in IR.
      
      [1] https://reviews.llvm.org/D19678#419445
      
      Differential Revision: https://reviews.llvm.org/D24587
      
      llvm-svn: 282539
      a62b7e1a
    • Adam Nemet's avatar
      Revert "Output optimization remarks in YAML" · cc2a3fa8
      Adam Nemet authored
      This reverts commit r282499.
      
      The GCC bots are failing
      
      llvm-svn: 282503
      cc2a3fa8
    • Adam Nemet's avatar
      Output optimization remarks in YAML · 92e928c1
      Adam Nemet authored
      This allows various presentation of this data using an external tool.
      This was first recommended here[1].
      
      As an example, consider this module:
      
        1 int foo();
        2 int bar();
        3
        4 int baz() {
        5   return foo() + bar();
        6 }
      
      The inliner generates these missed-optimization remarks today (the
      hotness information is pulled from PGO):
      
        remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
        remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
      
      Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
      
        --- !Missed
        Pass:            inline
        Name:            NotInlined
        DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
        Function:        baz
        Hotness:         30
        Args:
          - Callee: foo
          - String:  will not be inlined into
          - Caller: baz
        ...
        --- !Missed
        Pass:            inline
        Name:            NotInlined
        DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
        Function:        baz
        Hotness:         30
        Args:
          - Callee: bar
          - String:  will not be inlined into
          - Caller: baz
        ...
      
      This is a summary of the high-level decisions:
      
      * There is a new streaming interface to emit optimization remarks.
      E.g. for the inliner remark above:
      
         ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                      DEBUG_TYPE, "NotInlined", &I)
                  << NV("Callee", Callee) << " will not be inlined into "
                  << NV("Caller", CS.getCaller()) << setIsVerbose());
      
      NV stands for named value and allows the YAML client to process a remark
      using its name (NotInlined) and the named arguments (Callee and Caller)
      without parsing the text of the message.
      
      Subsequent patches will update ORE users to use the new streaming API.
      
      * I am using YAML I/O for writing the YAML file.  YAML I/O requires you
      to specify reading and writing at once but reading is highly non-trivial
      for some of the more complex LLVM types.  Since it's not clear that we
      (ever) want to use LLVM to parse this YAML file, the code supports and
      asserts that we're writing only.
      
      On the other hand, I did experiment that the class hierarchy starting at
      DiagnosticInfoOptimizationBase can be mapped back from YAML generated
      here (see D24479).
      
      * The YAML stream is stored in the LLVM context.
      
      * In the example, we can probably further specify the IR value used,
      i.e. print "Function" rather than "Value".
      
      * As before hotness is computed in the analysis pass instead of
      DiganosticInfo.  This avoids the layering problem since BFI is in
      Analysis while DiagnosticInfo is in IR.
      
      [1] https://reviews.llvm.org/D19678#419445
      
      Differential Revision: https://reviews.llvm.org/D24587
      
      llvm-svn: 282499
      92e928c1
  17. Aug 26, 2016
    • Adam Nemet's avatar
      [Inliner] Report when inlining fails because callee's def is unavailable · cef33141
      Adam Nemet authored
      Summary:
      This is obviously an interesting case because it may motivate code
      restructuring or LTO.
      
      Reporting this requires instantiation of ORE in the loop where the call
      sites are first gathered.  I've checked compile-time
      overhead *with* -Rpass-with-hotness and the worst slow-down was 6% in
      mcf and quickly tailing off.  As before without -Rpass-with-hotness
      there is no overhead.
      
      Because this could be a pretty noisy diagnostics, it is currently
      qualified as 'verbose'.  As of this patch, 'verbose' diagnostics are
      only emitted with -Rpass-with-hotness, i.e. when the output is expected
      to be filtered.
      
      Reviewers: eraman, chandlerc, davidxl, hfinkel
      
      Subscribers: tejohnson, Prazek, davide, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D23415
      
      llvm-svn: 279860
      cef33141
  18. Aug 17, 2016
    • Chandler Carruth's avatar
      [Inliner] Add a flag to disable manual alloca merging in the Inliner. · f702d8ec
      Chandler Carruth authored
      This is off for now while testing can take place to make sure that in
      fact we do sufficient stack coloring to fully obviate the manual alloca
      array merging.
      
      Some context on why we should be using stack coloring rather than
      merging allocas in this way:
      
      LLVM relies very heavily on analyzing pointers as coming from different
      allocas in order to make aliasing decisions. These are some of the most
      powerful aliasing signals available in LLVM. So merging allocas is an
      extremely destructive operation on the LLVM IR -- it takes away highly
      valuable and hard to reconstruct information.
      
      As a consequence, inlined functions which happen to have array allocas
      that this pattern matches will fail to be properly interleaved unless
      SROA manages to hoist everything to an SSA register. Instead, the
      inliner will have added an unnecessary dependence that one inlined
      function execute after the other because they will have been rewritten
      to refer to the same memory.
      
      All that said, folks will reasonably want some time to experiment here
      and make sure there are no significant regressions. A flag should give
      us an easy knob to test.
      
      For more context, see the thread here:
      http://lists.llvm.org/pipermail/llvm-dev/2016-July/103277.html
      http://lists.llvm.org/pipermail/llvm-dev/2016-August/103285.html
      
      Differential Revision: https://reviews.llvm.org/D23052
      
      llvm-svn: 278892
      f702d8ec
  19. Aug 10, 2016
    • Piotr Padlewski's avatar
      Changed sign of LastCallToStaticBouns · d89875ca
      Piotr Padlewski authored
      Summary:
      I think it is much better this way.
      When I firstly saw line:
        Cost += InlineConstants::LastCallToStaticBonus;
      I though that this is a bug, because everywhere where the cost is being reduced
      it is usuing -=.
      
      Reviewers: eraman, tejohnson, mehdi_amini
      
      Subscribers: llvm-commits, mehdi_amini
      
      Differential Revision: https://reviews.llvm.org/D23222
      
      llvm-svn: 278290
      d89875ca
    • Adam Nemet's avatar
      [Inliner,OptDiag] Add hotness attribute to opt diagnostics · 896c09bd
      Adam Nemet authored
      Summary:
      The inliner not being a function pass requires the work-around of
      generating the OptimizationRemarkEmitter and in turn BFI on demand.
      This will go away after the new PM is ready.
      
      BFI is only computed inside ORE if the user has requested hotness
      information for optimization diagnostitics (-pass-remark-with-hotness at
      the 'opt' level).  Thus there is no additional overhead without the
      flag.
      
      Reviewers: hfinkel, davidxl, eraman
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D22694
      
      llvm-svn: 278185
      896c09bd
  20. Aug 06, 2016
  21. Aug 03, 2016
  22. Jul 29, 2016
    • Piotr Padlewski's avatar
      Added ThinLTO inlining statistics · 84abc74f
      Piotr Padlewski authored
      Summary:
      copypasta doc of ImportedFunctionsInliningStatistics class
       \brief Calculate and dump ThinLTO specific inliner stats.
       The main statistics are:
       (1) Number of inlined imported functions,
       (2) Number of imported functions inlined into importing module (indirect),
       (3) Number of non imported functions inlined into importing module
       (indirect).
       The difference between first and the second is that first stat counts
       all performed inlines on imported functions, but the second one only the
       functions that have been eventually inlined to a function in the importing
       module (by a chain of inlines). Because llvm uses bottom-up inliner, it is
       possible to e.g. import function `A`, `B` and then inline `B` to `A`,
       and after this `A` might be too big to be inlined into some other function
       that calls it. It calculates this statistic by building graph, where
       the nodes are functions, and edges are performed inlines and then by marking
       the edges starting from not imported function.
      
       If `Verbose` is set to true, then it also dumps statistics
       per each inlined function, sorted by the greatest inlines count like
       - number of performed inlines
       - number of performed inlines to importing module
      
      Reviewers: eraman, tejohnson, mehdi_amini
      
      Subscribers: mehdi_amini, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D22491
      
      llvm-svn: 277089
      84abc74f
  23. Jul 23, 2016
  24. Jun 10, 2016
  25. May 23, 2016
  26. Apr 30, 2016
  27. Apr 29, 2016
  28. Apr 23, 2016
  29. Apr 22, 2016
  30. Apr 21, 2016
    • Andrew Kaylor's avatar
      Initial implementation of optimization bisect support. · f0f27929
      Andrew Kaylor authored
      This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.
      
      The bisection is enabled using a new command line option (-opt-bisect-limit).  Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit.  A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.
      
      The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check.  Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute.  A new function call has been added for module and SCC passes that behaves in a similar way.
      
      Differential Revision: http://reviews.llvm.org/D19172
      
      llvm-svn: 267022
      f0f27929
Loading