Skip to content
  1. May 10, 2017
  2. Mar 22, 2017
  3. Mar 16, 2017
  4. Mar 09, 2017
    • Chandler Carruth's avatar
      [PM/Inliner] Make the new PM's inliner process call edges across an · 20e588e1
      Chandler Carruth authored
      entire SCC before iterating on newly-introduced call edges resulting
      from any inlined function bodies.
      
      This more closely matches the behavior of the old PM's inliner. While it
      wasn't really clear to me initially, this behavior is actually essential
      to the inliner behaving reasonably in its current design.
      
      Because the inliner is fundamentally a bottom-up inliner and all of its
      cost modeling is designed around that it often runs into trouble within
      an SCC where we don't have any meaningful bottom-up ordering to use. In
      addition to potentially cyclic, infinite inlining that we block with the
      inline history mechanism, it can also take seemingly simple call graph
      patterns within an SCC and turn them into *insanely* large functions by
      accidentally working top-down across the SCC without any of the
      threshold limitations that traditional top-down inliners use.
      
      Consider this diabolical monster.cpp file that Richard Smith came up
      with to help demonstrate this issue:
      ```
      template <int N> extern const char *str;
      
      void g(const char *);
      
      template <bool K, int N> void f(bool *B, bool *E) {
        if (K)
          g(str<N>);
        if (B == E)
          return;
        if (*B)
          f<true, N + 1>(B + 1, E);
        else
          f<false, N + 1>(B + 1, E);
      }
      template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); }
      template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); }
      
      extern bool *arr, *end;
      void test() { f<false, 0>(arr, end); }
      ```
      
      When compiled with '-DMAX=N' for various values of N, this will create an SCC
      with a reasonably large number of functions. Previously, the inliner would try
      to exhaust the inlining candidates in a single function before moving on. This,
      unfortunately, turns it into a top-down inliner within the SCC. Because our
      thresholds were never built for that, we will incrementally decide that it is
      always worth inlining and proceed to flatten the entire SCC into that one
      function.
      
      What's worse, we'll then proceed to the next function, and do the exact same
      thing except we'll skip the first function, and so on. And at each step, we'll
      also make some of the constant factors larger, which is awesome.
      
      The fix in this patch is the obvious one which makes the new PM's inliner use
      the same technique used by the old PM: consider all the call edges across the
      entire SCC before beginning to process call edges introduced by inlining. The
      result of this is essentially to distribute the inlining across the SCC so that
      every function incrementally grows toward the inline thresholds rather than
      allowing the inliner to grow one of the functions vastly beyond the threshold.
      The code for this is a bit awkward, but it works out OK.
      
      We could consider in the future doing something more powerful here such as
      prioritized order (via lowest cost and/or profile info) and/or a code-growth
      budget per SCC. However, both of those would require really substantial work
      both to design the system in a way that wouldn't break really useful
      abstraction decomposition properties of the current inliner and to be tuned
      across a reasonably diverse set of code and workloads. It also seems really
      risky in many ways. I have only found a single real-world file that triggers
      the bad behavior here and it is generated code that has a pretty pathological
      pattern. I'm not worried about the inliner not doing an *awesome* job here as
      long as it does *ok*. On the other hand, the cases that will be tricky to get
      right in a prioritized scheme with a budget will be more common and idiomatic
      for at least some frontends (C++ and Rust at least). So while these approaches
      are still really interesting, I'm not in a huge rush to go after them. Staying
      even closer to the existing PM's behavior, especially when this easy to do,
      seems like the right short to medium term approach.
      
      I don't really have a test case that makes sense yet... I'll try to find a
      variant of the IR produced by the monster template metaprogram that is both
      small enough to be sane and large enough to clearly show when we get this wrong
      in the future. But I'm not confident this exists. And the behavior change here
      *should* be unobservable without snooping on debug logging. So there isn't
      really much to test.
      
      The test case updates come from two incidental changes:
      1) We now visit functions in an SCC in the opposite order. I don't think there
         really is a "right" order here, so I just update the test cases.
      2) We no longer compute some analyses when an SCC has no call instructions that
         we consider for inlining.
      
      llvm-svn: 297374
      20e588e1
  5. Feb 14, 2017
    • Taewook Oh's avatar
      Do not apply redundant LastCallToStaticBonus · f22fa72e
      Taewook Oh authored
      Summary:
      As written in the comments above, LastCallToStaticBonus is already applied to
      the cost if Caller has only one user, so it is redundant to reapply the bonus
      here.
      
      If the only user is not a caller, TotalSecondaryCost will not be adjusted
      anyway because callerWillBeRemoved is false. If there's no caller at all, we
      don't need to care about TotalSecondaryCost because
      inliningPreventsSomeOuterInline is false.
      
      Reviewers: chandlerc, eraman
      
      Reviewed By: eraman
      
      Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini
      
      Differential Revision: https://reviews.llvm.org/D29169
      
      llvm-svn: 295075
      f22fa72e
  6. Feb 10, 2017
    • Chandler Carruth's avatar
      [PM/LCG] Teach the LazyCallGraph how to replace a function without · aaad9f84
      Chandler Carruth authored
      disturbing the graph or having to update edges.
      
      This is motivated by porting argument promotion to the new pass manager.
      Because of how LLVM IR Function objects work, in order to change their
      signature a new object needs to be created. This is efficient and
      straight forward in the IR but previously was very hard to implement in
      LCG. We could easily replace the function a node in the graph
      represents. The challenging part is how to handle updating the edges in
      the graph.
      
      LCG previously used an edge to a raw function to represent a node that
      had not yet been scanned for calls and references. This was the core
      of its laziness. However, that model causes this kind of update to be
      very hard:
      1) The keys to lookup an edge need to be `Function*`s that would all
         need to be updated when we update the node.
      2) There will be some unknown number of edges that haven't transitioned
         from `Function*` edges to `Node*` edges.
      
      All of this complexity isn't necessary. Instead, we can always build
      a node around any function, always pointing edges at it and always using
      it as the key to lookup an edge. To maintain the laziness, we need to
      sink the *edges* of a node into a secondary object and explicitly model
      transitioning a node from empty to populated by scanning the function.
      This design seems much cleaner in a number of ways, but importantly
      there is now exactly *one* place where the `Function*` has to be
      updated!
      
      Some other cleanups that fall out of this include having something to
      model the *entry* edges more accurately. Rather than hand rolling parts
      of the node in the graph itself, we have an explicit `EdgeSequence`
      object that gives us exactly the functionality needed. We also have
      a consistent place to define the edge iterators and can use them for
      both the entry edges and the internal edges of the graph.
      
      The API used to model the separation between a node and its edges is
      intentionally very thin as most clients are expected to deal with nodes
      that have populated edges. We model this exactly as an optional does
      with an additional method to populate the edges when that is
      a reasonable thing for a client to do. This is based on API design
      suggestions from Richard Smith and David Blaikie, credit goes to them
      for helping pick how to model this without it being either too explicit
      or too implicit.
      
      The patch is somewhat noisy due to shifting around iterator types and
      new syntax for walking the edges of a node, but most of the
      functionality change is in the `Edge`, `EdgeSequence`, and `Node` types.
      
      Differential Revision: https://reviews.llvm.org/D29577
      
      llvm-svn: 294653
      aaad9f84
    • Peter Collingbourne's avatar
      De-duplicate some code for creating an AARGetter suitable for the legacy PM. · cea1e4e7
      Peter Collingbourne authored
      I'm about to use this in a couple more places.
      
      Differential Revision: https://reviews.llvm.org/D29793
      
      llvm-svn: 294648
      cea1e4e7
  7. Jan 30, 2017
  8. Jan 24, 2017
    • Chandler Carruth's avatar
      [PH] Replace uses of AssertingVH from members of analysis results with · 6acdca78
      Chandler Carruth authored
      a lazy-asserting PoisoningVH.
      
      AssertVH is fundamentally incompatible with cache-invalidation of
      analysis results. The invaliadtion happens after the AssertingVH has
      already fired. Instead, use a PoisoningVH that will assert if the
      dangling handle is ever used rather than merely be assigned or
      destroyed.
      
      This patch also removes all of the (numerous) doomed attempts to work
      around this fundamental incompatibility. It is a pretty significant
      simplification IMO.
      
      The most interesting change is in the Inliner where we still do some
      clearing because we don't want to rely on the coarse grained
      invalidation strategy of the containing pass manager. However, I prefer
      the approach that contains this logic to the cleanup phase of the
      Inliner, and I think we could enhance the CGSCC analysis management
      layer to make this even better in the future if desired.
      
      The rest is straight cleanup.
      
      I've also added a test for one of the harder cases to work around: when
      a *module analysis* contains many AssertingVHes pointing at functions.
      
      Differential Revision: https://reviews.llvm.org/D29006
      
      llvm-svn: 292928
      6acdca78
  9. Jan 23, 2017
  10. Jan 22, 2017
    • Chandler Carruth's avatar
      [PM] Fix a really nasty bug introduced when adding PGO support to the · b698d596
      Chandler Carruth authored
      new PM's inliner.
      
      The bug happens when we refine an SCC after having computed a proxy for
      the FunctionAnalysisManager, and then proceed to compute fresh analyses
      for functions in the *new* SCC using the manager provided by the old
      SCC's proxy. *And* when we manage to mutate a function in this new SCC
      in a way that invalidates those analyses. This can be... challenging to
      reproduce.
      
      I've managed to contrive a set of functions that trigger this and added
      a test case, but it is a bit brittle. I've directly checked that the
      passes run in the expected ways to help avoid the test just becoming
      silently irrelevant.
      
      This gets the new PM back to passing the LLVM test suite after the PGO
      improvements landed.
      
      llvm-svn: 292757
      b698d596
    • Chandler Carruth's avatar
      [PM] Add some debug logging to the new PM inliner to make it easier to · d4be9f4b
      Chandler Carruth authored
      trace its behavior.
      
      llvm-svn: 292756
      d4be9f4b
  11. Jan 20, 2017
    • Easwaran Raman's avatar
      Improve PGO support for the new inliner · 12585b01
      Easwaran Raman authored
      This adds the following to the new PM based inliner in PGO mode:
      
      * Use block frequency analysis to derive callsite's profile count and use
      that to adjust thresholds of hot and cold callsites.
      
      * Incrementally update the BFI of the caller after a callee gets inlined
      into it. This incremental update is only within an invocation of the run
      method - BFI is not preserved across calls to run.
      Update the function entry count of the callee after inlining it into a
      caller.
      
      * I've tuned the thresholds for the hot and cold callsites using a hacked
      up version of the old inliner that explicitly computes BFI on a set of
      internal benchmarks and spec. Once the new PM based pipeline stabilizes
      (IIRC Chandler mentioned there are known issues) I'll benchmark this
      again and adjust the thresholds if required.
      Inliner PGO support.
      
      Differential revision: https://reviews.llvm.org/D28331
      
      llvm-svn: 292666
      12585b01
  12. Dec 28, 2016
    • Chandler Carruth's avatar
      [PM] Teach the inliner's call graph update to handle inserting new edges · 9900d18b
      Chandler Carruth authored
      when they are call edges at the leaf but may (transitively) be reached
      via ref edges.
      
      It turns out there is a simple rule: insert everything as a ref edge
      which is a safe conservative default. Then we let the existing update
      logic handle promoting some of those to call edges.
      
      Note that it would be fairly cheap to make these call edges right away
      if that is desirable by testing whether there is some existing call path
      from the source to the target. It just seemed like slightly more
      complexity in this code path that isn't strictly necessary. If anyone
      feels strongly about handling this differently I'm happy to change it.
      
      llvm-svn: 290649
      9900d18b
  13. Dec 27, 2016
    • Chandler Carruth's avatar
      [PM] Add one of the features left out of the initial inliner patch: · 141bf5d1
      Chandler Carruth authored
      skipping indirectly recursive inline chains.
      
      To do this, we implicitly build an inline stack for each callsite and
      check prior to inlining that doing so would not form a cycle. This uses
      the exact same technique and even shares some code with the legacy PM
      inliner.
      
      This solution remains deeply unsatisfying to me because it means we
      cannot actually iterate the inliner externally. Doing so would not be
      able to easily detect and avoid such cycles. Some day I would very much
      like to have a solution that works without this internal state to detect
      cycles, but this is not that day.
      
      llvm-svn: 290590
      141bf5d1
    • Chandler Carruth's avatar
      [PM] Teach the inliner in the new PM to merge attributes after inlining. · 03130d98
      Chandler Carruth authored
      Also enable the new PM in the attributes test case which caught this
      issue.
      
      llvm-svn: 290572
      03130d98
    • Chandler Carruth's avatar
      [PM] Teach the always inliner in the new pass manager to support · 6e9bb7e0
      Chandler Carruth authored
      removing fully-dead comdats without removing dead entries in comdats
      with live members.
      
      This factors the core logic out of the current inliner's internals to
      a reusable utility and leverages that in both places. The factored out
      code should also be (minorly) more efficient in cases where we have very
      few dead functions or dead comdats to consider.
      
      I've added a test case to cover this behavior of the always inliner.
      This is the last significant bug in the new PM's always inliner I've
      found (so far).
      
      llvm-svn: 290557
      6e9bb7e0
  14. Dec 22, 2016
  15. Dec 20, 2016
    • Chandler Carruth's avatar
      [PM] Provide an initial, minimal port of the inliner to the new pass manager. · 1d963114
      Chandler Carruth authored
      This doesn't implement *every* feature of the existing inliner, but
      tries to implement the most important ones for building a functional
      optimization pipeline and beginning to sort out bugs, regressions, and
      other problems.
      
      Notable, but intentional omissions:
      - No alloca merging support. Why? Because it isn't clear we want to do
        this at all. Active discussion and investigation is going on to remove
        it, so for simplicity I omitted it.
      - No support for trying to iterate on "internally" devirtualized calls.
        Why? Because it adds what I suspect is inappropriate coupling for
        little or no benefit. We will have an outer iteration system that
        tracks devirtualization including that from function passes and
        iterates already. We should improve that rather than approximate it
        here.
      - Optimization remarks. Why? Purely to make the patch smaller, no other
        reason at all.
      
      The last one I'll probably work on almost immediately. But I wanted to
      skip it in the initial patch to try to focus the change as much as
      possible as there is already a lot of code moving around and both of
      these *could* be skipped without really disrupting the core logic.
      
      A summary of the different things happening here:
      
      1) Adding the usual new PM class and rigging.
      
      2) Fixing minor underlying assumptions in the inline cost analysis or
         inline logic that don't generally hold in the new PM world.
      
      3) Adding the core pass logic which is in essence a loop over the calls
         in the nodes in the call graph. This is a bit duplicated from the old
         inliner, but only a handful of lines could realistically be shared.
         (I tried at first, and it really didn't help anything.) All told,
         this is only about 100 lines of code, and most of that is the
         mechanics of wiring up analyses from the new PM world.
      
      4) Updating the LazyCallGraph (in the new PM) based on the *newly
         inlined* calls and references. This is very minimal because we cannot
         form cycles.
      
      5) When inlining removes the last use of a function, eagerly nuking the
         body of the function so that any "one use remaining" inline cost
         heuristics are immediately refined, and queuing these functions to be
         completely deleted once inlining is complete and the call graph
         updated to reflect that they have become dead.
      
      6) After all the inlining for a particular function, updating the
         LazyCallGraph and the CGSCC pass manager to reflect the
         function-local simplifications that are done immediately and
         internally by the inline utilties. These are the exact same
         fundamental set of CG updates done by arbitrary function passes.
      
      7) Adding a bunch of test cases to specifically target CGSCC and other
         subtle aspects in the new PM world.
      
      Many thanks to the careful review from Easwaran and Sanjoy and others!
      
      Differential Revision: https://reviews.llvm.org/D24226
      
      llvm-svn: 290161
      1d963114
  16. Dec 19, 2016
  17. Dec 15, 2016
    • Hal Finkel's avatar
      Remove the AssumptionCache · 3ca4a6bc
      Hal Finkel authored
      After r289755, the AssumptionCache is no longer needed. Variables affected by
      assumptions are now found by using the new operand-bundle-based scheme. This
      new scheme is more computationally efficient, and also we need much less
      code...
      
      llvm-svn: 289756
      3ca4a6bc
  18. Nov 20, 2016
  19. Nov 04, 2016
  20. Oct 08, 2016
  21. Sep 28, 2016
  22. Sep 27, 2016
    • Adam Nemet's avatar
      [Inliner] Fold the analysis remark into the missed remark · 1142147e
      Adam Nemet authored
      There is really no reason for these to be separate.
      
      The vectorizer started this pretty bad tradition that the text of the
      missed remarks is pretty meaningless, i.e. vectorization failed.  There,
      you have to query analysis to get the full picture.
      
      I think we should just explain the reason for missing the optimization
      in the missed remark when possible.  Analysis remarks should provide
      information that the pass gathers regardless whether the optimization is
      passing or not.
      
      llvm-svn: 282542
      1142147e
    • Adam Nemet's avatar
      Output optimization remarks in YAML · a62b7e1a
      Adam Nemet authored
      (Re-committed after moving the template specialization under the yaml
      namespace.  GCC was complaining about this.)
      
      This allows various presentation of this data using an external tool.
      This was first recommended here[1].
      
      As an example, consider this module:
      
        1 int foo();
        2 int bar();
        3
        4 int baz() {
        5   return foo() + bar();
        6 }
      
      The inliner generates these missed-optimization remarks today (the
      hotness information is pulled from PGO):
      
        remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
        remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
      
      Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
      
        --- !Missed
        Pass:            inline
        Name:            NotInlined
        DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
        Function:        baz
        Hotness:         30
        Args:
          - Callee: foo
          - String:  will not be inlined into
          - Caller: baz
        ...
        --- !Missed
        Pass:            inline
        Name:            NotInlined
        DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
        Function:        baz
        Hotness:         30
        Args:
          - Callee: bar
          - String:  will not be inlined into
          - Caller: baz
        ...
      
      This is a summary of the high-level decisions:
      
      * There is a new streaming interface to emit optimization remarks.
      E.g. for the inliner remark above:
      
         ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                      DEBUG_TYPE, "NotInlined", &I)
                  << NV("Callee", Callee) << " will not be inlined into "
                  << NV("Caller", CS.getCaller()) << setIsVerbose());
      
      NV stands for named value and allows the YAML client to process a remark
      using its name (NotInlined) and the named arguments (Callee and Caller)
      without parsing the text of the message.
      
      Subsequent patches will update ORE users to use the new streaming API.
      
      * I am using YAML I/O for writing the YAML file.  YAML I/O requires you
      to specify reading and writing at once but reading is highly non-trivial
      for some of the more complex LLVM types.  Since it's not clear that we
      (ever) want to use LLVM to parse this YAML file, the code supports and
      asserts that we're writing only.
      
      On the other hand, I did experiment that the class hierarchy starting at
      DiagnosticInfoOptimizationBase can be mapped back from YAML generated
      here (see D24479).
      
      * The YAML stream is stored in the LLVM context.
      
      * In the example, we can probably further specify the IR value used,
      i.e. print "Function" rather than "Value".
      
      * As before hotness is computed in the analysis pass instead of
      DiganosticInfo.  This avoids the layering problem since BFI is in
      Analysis while DiagnosticInfo is in IR.
      
      [1] https://reviews.llvm.org/D19678#419445
      
      Differential Revision: https://reviews.llvm.org/D24587
      
      llvm-svn: 282539
      a62b7e1a
    • Adam Nemet's avatar
      Revert "Output optimization remarks in YAML" · cc2a3fa8
      Adam Nemet authored
      This reverts commit r282499.
      
      The GCC bots are failing
      
      llvm-svn: 282503
      cc2a3fa8
    • Adam Nemet's avatar
      Output optimization remarks in YAML · 92e928c1
      Adam Nemet authored
      This allows various presentation of this data using an external tool.
      This was first recommended here[1].
      
      As an example, consider this module:
      
        1 int foo();
        2 int bar();
        3
        4 int baz() {
        5   return foo() + bar();
        6 }
      
      The inliner generates these missed-optimization remarks today (the
      hotness information is pulled from PGO):
      
        remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
        remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
      
      Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
      
        --- !Missed
        Pass:            inline
        Name:            NotInlined
        DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
        Function:        baz
        Hotness:         30
        Args:
          - Callee: foo
          - String:  will not be inlined into
          - Caller: baz
        ...
        --- !Missed
        Pass:            inline
        Name:            NotInlined
        DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
        Function:        baz
        Hotness:         30
        Args:
          - Callee: bar
          - String:  will not be inlined into
          - Caller: baz
        ...
      
      This is a summary of the high-level decisions:
      
      * There is a new streaming interface to emit optimization remarks.
      E.g. for the inliner remark above:
      
         ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                      DEBUG_TYPE, "NotInlined", &I)
                  << NV("Callee", Callee) << " will not be inlined into "
                  << NV("Caller", CS.getCaller()) << setIsVerbose());
      
      NV stands for named value and allows the YAML client to process a remark
      using its name (NotInlined) and the named arguments (Callee and Caller)
      without parsing the text of the message.
      
      Subsequent patches will update ORE users to use the new streaming API.
      
      * I am using YAML I/O for writing the YAML file.  YAML I/O requires you
      to specify reading and writing at once but reading is highly non-trivial
      for some of the more complex LLVM types.  Since it's not clear that we
      (ever) want to use LLVM to parse this YAML file, the code supports and
      asserts that we're writing only.
      
      On the other hand, I did experiment that the class hierarchy starting at
      DiagnosticInfoOptimizationBase can be mapped back from YAML generated
      here (see D24479).
      
      * The YAML stream is stored in the LLVM context.
      
      * In the example, we can probably further specify the IR value used,
      i.e. print "Function" rather than "Value".
      
      * As before hotness is computed in the analysis pass instead of
      DiganosticInfo.  This avoids the layering problem since BFI is in
      Analysis while DiagnosticInfo is in IR.
      
      [1] https://reviews.llvm.org/D19678#419445
      
      Differential Revision: https://reviews.llvm.org/D24587
      
      llvm-svn: 282499
      92e928c1
  23. Aug 26, 2016
    • Adam Nemet's avatar
      [Inliner] Report when inlining fails because callee's def is unavailable · cef33141
      Adam Nemet authored
      Summary:
      This is obviously an interesting case because it may motivate code
      restructuring or LTO.
      
      Reporting this requires instantiation of ORE in the loop where the call
      sites are first gathered.  I've checked compile-time
      overhead *with* -Rpass-with-hotness and the worst slow-down was 6% in
      mcf and quickly tailing off.  As before without -Rpass-with-hotness
      there is no overhead.
      
      Because this could be a pretty noisy diagnostics, it is currently
      qualified as 'verbose'.  As of this patch, 'verbose' diagnostics are
      only emitted with -Rpass-with-hotness, i.e. when the output is expected
      to be filtered.
      
      Reviewers: eraman, chandlerc, davidxl, hfinkel
      
      Subscribers: tejohnson, Prazek, davide, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D23415
      
      llvm-svn: 279860
      cef33141
  24. Aug 17, 2016
    • Chandler Carruth's avatar
      [Inliner] Add a flag to disable manual alloca merging in the Inliner. · f702d8ec
      Chandler Carruth authored
      This is off for now while testing can take place to make sure that in
      fact we do sufficient stack coloring to fully obviate the manual alloca
      array merging.
      
      Some context on why we should be using stack coloring rather than
      merging allocas in this way:
      
      LLVM relies very heavily on analyzing pointers as coming from different
      allocas in order to make aliasing decisions. These are some of the most
      powerful aliasing signals available in LLVM. So merging allocas is an
      extremely destructive operation on the LLVM IR -- it takes away highly
      valuable and hard to reconstruct information.
      
      As a consequence, inlined functions which happen to have array allocas
      that this pattern matches will fail to be properly interleaved unless
      SROA manages to hoist everything to an SSA register. Instead, the
      inliner will have added an unnecessary dependence that one inlined
      function execute after the other because they will have been rewritten
      to refer to the same memory.
      
      All that said, folks will reasonably want some time to experiment here
      and make sure there are no significant regressions. A flag should give
      us an easy knob to test.
      
      For more context, see the thread here:
      http://lists.llvm.org/pipermail/llvm-dev/2016-July/103277.html
      http://lists.llvm.org/pipermail/llvm-dev/2016-August/103285.html
      
      Differential Revision: https://reviews.llvm.org/D23052
      
      llvm-svn: 278892
      f702d8ec
  25. Aug 10, 2016
    • Piotr Padlewski's avatar
      Changed sign of LastCallToStaticBouns · d89875ca
      Piotr Padlewski authored
      Summary:
      I think it is much better this way.
      When I firstly saw line:
        Cost += InlineConstants::LastCallToStaticBonus;
      I though that this is a bug, because everywhere where the cost is being reduced
      it is usuing -=.
      
      Reviewers: eraman, tejohnson, mehdi_amini
      
      Subscribers: llvm-commits, mehdi_amini
      
      Differential Revision: https://reviews.llvm.org/D23222
      
      llvm-svn: 278290
      d89875ca
    • Adam Nemet's avatar
      [Inliner,OptDiag] Add hotness attribute to opt diagnostics · 896c09bd
      Adam Nemet authored
      Summary:
      The inliner not being a function pass requires the work-around of
      generating the OptimizationRemarkEmitter and in turn BFI on demand.
      This will go away after the new PM is ready.
      
      BFI is only computed inside ORE if the user has requested hotness
      information for optimization diagnostitics (-pass-remark-with-hotness at
      the 'opt' level).  Thus there is no additional overhead without the
      flag.
      
      Reviewers: hfinkel, davidxl, eraman
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D22694
      
      llvm-svn: 278185
      896c09bd
  26. Aug 06, 2016
  27. Aug 03, 2016
  28. Jul 29, 2016
    • Piotr Padlewski's avatar
      Added ThinLTO inlining statistics · 84abc74f
      Piotr Padlewski authored
      Summary:
      copypasta doc of ImportedFunctionsInliningStatistics class
       \brief Calculate and dump ThinLTO specific inliner stats.
       The main statistics are:
       (1) Number of inlined imported functions,
       (2) Number of imported functions inlined into importing module (indirect),
       (3) Number of non imported functions inlined into importing module
       (indirect).
       The difference between first and the second is that first stat counts
       all performed inlines on imported functions, but the second one only the
       functions that have been eventually inlined to a function in the importing
       module (by a chain of inlines). Because llvm uses bottom-up inliner, it is
       possible to e.g. import function `A`, `B` and then inline `B` to `A`,
       and after this `A` might be too big to be inlined into some other function
       that calls it. It calculates this statistic by building graph, where
       the nodes are functions, and edges are performed inlines and then by marking
       the edges starting from not imported function.
      
       If `Verbose` is set to true, then it also dumps statistics
       per each inlined function, sorted by the greatest inlines count like
       - number of performed inlines
       - number of performed inlines to importing module
      
      Reviewers: eraman, tejohnson, mehdi_amini
      
      Subscribers: mehdi_amini, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D22491
      
      llvm-svn: 277089
      84abc74f
  29. Jul 23, 2016
Loading