Skip to content
  1. Aug 28, 2018
    • David Bolvansky's avatar
      [Inliner] Attribute callsites with inline remarks · c1b27b56
      David Bolvansky authored
      Summary:
      Sometimes reading an output *.ll file it is not easy to understand why some callsites are not inlined. We can read output of inline remarks (option --pass-remarks-missed=inline) and try correlating its messages with the callsites.
      
      An easier way proposed by this patch is to add to every callsite processed by Inliner an attribute with the latest message that describes the cause of not inlining this callsite. The attribute is called //inline-remark//. By default this feature is off. It can be switched on by the option //-inline-remark-attribute//.
      
      For example in the provided test the result method //@test1// has two callsites //@bar// and inline remarks report different inlining missed reasons:
        remark: <unknown>:0:0: bar not inlined into test1 because too costly to inline (cost=-5, threshold=-6)
        remark: <unknown>:0:0: bar not inlined into test1 because it should never be inlined (cost=never): recursive
      
      It is not clear which remark correspond to which callsite. With the inline remark attribute enabled we get the reasons attached to their callsites:
        define void @test1() {
          call void @bar(i1 true) #0
          call void @bar(i1 false) #2
          ret void
        }
        attributes #0 = { "inline-remark"="(cost=-5, threshold=-6)" }
        ..
        attributes #2 = { "inline-remark"="(cost=never): recursive" }
      
      Patch by: yrouban (Yevgeny Rouban)
      
      Reviewers: xbolva00, tejohnson, apilipenko
      
      Reviewed By: xbolva00, tejohnson
      
      Subscribers: eraman, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D50435
      
      llvm-svn: 340834
      c1b27b56
  2. Aug 24, 2018
    • David Bolvansky's avatar
      Revert [Inliner] Attribute callsites with inline remarks · 1ccbddca
      David Bolvansky authored
      llvm-svn: 340619
      1ccbddca
    • David Bolvansky's avatar
      [Inliner] Attribute callsites with inline remarks · 7c0537a3
      David Bolvansky authored
      Summary:
      Sometimes reading an output *.ll file it is not easy to understand why some callsites are not inlined. We can read output of inline remarks (option --pass-remarks-missed=inline) and try correlating its messages with the callsites.
      
      An easier way proposed by this patch is to add to every callsite processed by Inliner an attribute with the latest message that describes the cause of not inlining this callsite. The attribute is called //inline-remark//. By default this feature is off. It can be switched on by the option //-inline-remark-attribute//.
      
      For example in the provided test the result method //@test1// has two callsites //@bar// and inline remarks report different inlining missed reasons:
        remark: <unknown>:0:0: bar not inlined into test1 because too costly to inline (cost=-5, threshold=-6)
        remark: <unknown>:0:0: bar not inlined into test1 because it should never be inlined (cost=never): recursive
      
      It is not clear which remark correspond to which callsite. With the inline remark attribute enabled we get the reasons attached to their callsites:
        define void @test1() {
          call void @bar(i1 true) #0
          call void @bar(i1 false) #2
          ret void
        }
        attributes #0 = { "inline-remark"="(cost=-5, threshold=-6)" }
        ..
        attributes #2 = { "inline-remark"="(cost=never): recursive" }
      
      Patch by: yrouban (Yevgeny Rouban)
      
      Reviewers: xbolva00, tejohnson, apilipenko
      
      Reviewed By: xbolva00, tejohnson
      
      Subscribers: eraman, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D50435
      
      llvm-svn: 340618
      7c0537a3
  3. Aug 14, 2018
  4. Aug 06, 2018
  5. Aug 05, 2018
    • David Bolvansky's avatar
      Enrich inline messages · c0aa4b75
      David Bolvansky authored
      Summary:
      This patch improves Inliner to provide causes/reasons for negative inline decisions.
      1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
      2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
      3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
      4. Adjusted tests for changed printing.
      
      Patch by: yrouban (Yevgeny Rouban)
      
      
      Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
      
      Reviewed By: tejohnson, xbolva00
      
      Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
      
      Differential Revision: https://reviews.llvm.org/D49412
      
      llvm-svn: 338969
      c0aa4b75
  6. Aug 01, 2018
    • David Bolvansky's avatar
      Revert "Enrich inline messages", tests fail · fbbb83c7
      David Bolvansky authored
      llvm-svn: 338496
      fbbb83c7
    • David Bolvansky's avatar
      Enrich inline messages · 7f36cd9d
      David Bolvansky authored
      Summary:
      This patch improves Inliner to provide causes/reasons for negative inline decisions.
      1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
      2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
      3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
      4. Adjusted tests for changed printing.
      
      Patch by: yrouban (Yevgeny Rouban)
      
      
      Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
      
      Reviewed By: tejohnson, xbolva00
      
      Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
      
      Differential Revision: https://reviews.llvm.org/D49412
      
      llvm-svn: 338494
      7f36cd9d
  7. Jul 31, 2018
    • David Bolvansky's avatar
      Revert Enrich inline messages · ab79414f
      David Bolvansky authored
      llvm-svn: 338389
      ab79414f
    • David Bolvansky's avatar
      Enrich inline messages · b562dbab
      David Bolvansky authored
      Summary:
      This patch improves Inliner to provide causes/reasons for negative inline decisions.
      1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
      2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
      3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
      4. Adjusted tests for changed printing.
      
      Patch by: yrouban (Yevgeny Rouban)
      
      
      Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
      
      Reviewed By: tejohnson, xbolva00
      
      Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
      
      Differential Revision: https://reviews.llvm.org/D49412
      
      llvm-svn: 338387
      b562dbab
  8. Jun 28, 2018
  9. Jun 04, 2018
    • David Blaikie's avatar
      Move Analysis/Utils/Local.h back to Transforms · 31b98d2e
      David Blaikie authored
      Review feedback from r328165. Split out just the one function from the
      file that's used by Analysis. (As chandlerc pointed out, the original
      change only moved the header and not the implementation anyway - which
      was fine for the one function that was used (since it's a
      template/inlined in the header) but not in general)
      
      llvm-svn: 333954
      31b98d2e
  10. May 14, 2018
    • Nicola Zaghen's avatar
      Rename DEBUG macro to LLVM_DEBUG. · d34e60ca
      Nicola Zaghen authored
          
      The DEBUG() macro is very generic so it might clash with other projects.
      The renaming was done as follows:
      - git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
      - git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
      - Manual change to APInt
      - Manually chage DOCS as regex doesn't match it.
      
      In the transition period the DEBUG() macro is still present and aliased
      to the LLVM_DEBUG() one.
      
      Differential Revision: https://reviews.llvm.org/D43624
      
      llvm-svn: 332240
      d34e60ca
  11. May 08, 2018
  12. Mar 21, 2018
    • David Blaikie's avatar
      Fix a couple of layering violations in Transforms · 2be39228
      David Blaikie authored
      Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering.
      
      Transforms depends on Transforms/Utils, not the other way around. So
      remove the header and the "createStripGCRelocatesPass" function
      declaration (& definition) that is unused and motivated this dependency.
      
      Move Transforms/Utils/Local.h into Analysis because it's used by
      Analysis/MemoryBuiltins.cpp.
      
      llvm-svn: 328165
      2be39228
  13. Oct 19, 2017
  14. Oct 11, 2017
  15. Oct 10, 2017
  16. Sep 28, 2017
    • Sanjoy Das's avatar
      Use a BumpPtrAllocator for Loop objects · def1729d
      Sanjoy Das authored
      Summary:
      And now that we no longer have to explicitly free() the Loop instances, we can
      (with more ease) use the destructor of LoopBase to do what LoopBase::clear() was
      doing.
      
      Reviewers: chandlerc
      
      Subscribers: mehdi_amini, mcrosier, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D38201
      
      llvm-svn: 314375
      def1729d
  17. Sep 20, 2017
    • Adam Nemet's avatar
      Allow ORE.emit to take a closure to delay building the remark object · 15fccf00
      Adam Nemet authored
      In the lambda we are now returning the remark by value so we need to preserve
      its type in the insertion operator.  This requires making the insertion
      operator generic.
      
      I've also converted a few cases to use the new API.  It seems to work pretty
      well.  See the LoopUnroller for a slightly more interesting case.
      
      llvm-svn: 313691
      15fccf00
  18. Aug 21, 2017
  19. Aug 20, 2017
  20. Aug 02, 2017
    • Chandler Carruth's avatar
      [PM] Fix a bug where through CGSCC iteration we can get · 95055d8f
      Chandler Carruth authored
      infinite-inlining across multiple runs of the inliner by keeping a tiny
      history of internal-to-SCC inlining decisions.
      
      This is still a bit gross, but I don't yet have any fundamentally better
      ideas and numerous people are blocked on this to use new PM and ThinLTO
      together.
      
      The core of the idea is to detect when we are about to do an inline that
      has a chance of re-splitting an SCC which we have split before with
      a similar inlining step. That is a critical component in the inlining
      forming a cycle and so far detects all of the various cyclic patterns
      I can come up with as well as the original real-world test case (which
      comes from a ThinLTO build of libunwind).
      
      I've added some tests that I think really demonstrate what is going on
      here. They are essentially state machines that march the inliner through
      various steps of a cycle and check that we stop when the cycle is closed
      and that we actually did do inlining to form that cycle.
      
      A lot of thanks go to Eric Christopher and Sanjoy Das for the help
      understanding this issue and improving the test cases.
      
      The biggest "yuck" here is the layering issue -- the CGSCC pass manager
      is providing somewhat magical state to the inliner for it to use to make
      itself converge. This isn't great, but I don't honestly have a lot of
      better ideas yet and at least seems nicely isolated.
      
      I have tested this patch, and it doesn't block *any* inlining on the
      entire LLVM test suite and SPEC, so it seems sufficiently narrowly
      targeted to the issue at hand.
      
      We have come up with hypothetical issues that this patch doesn't cover,
      but so far none of them are practical and we don't have a viable
      solution yet that covers the hypothetical stuff, so proceeding here in
      the interim. Definitely an area that we will be back and revisiting in
      the future.
      
      Differential Revision: https://reviews.llvm.org/D36188
      
      llvm-svn: 309784
      95055d8f
  21. Jul 19, 2017
    • Chandler Carruth's avatar
      [PM/LCG] Follow-up fix to r308088 to handle deletion of library · 06a86301
      Chandler Carruth authored
      functions.
      
      In the prior commit, we provide ordering to the LCG between functions
      and library function definitions that they might begin to call through
      transformations. But we still would delete these library functions from
      the call graph if they became dead during inlining.
      
      While this immediately crashed, it also exposed a loss of information.
      We shouldn't remove definitions of library functions that can still
      usefully participate in the LCG-powered CGSCC optimization process. If
      new call edges are formed, we want to have definitions to be called.
      
      We can still remove these functions if truly dead using global-dce, etc,
      but removing them during the CGSCC walk is premature.
      
      This fixes a crash in the new PM when optimizing some unusual libraries
      that end up with "internal" lib functions such as the code in the "R"
      language's libraries.
      
      llvm-svn: 308417
      06a86301
  22. Jul 09, 2017
    • Chandler Carruth's avatar
      [PM] Finish implementing and fix a chain of bugs uncovered by testing · bd9c2903
      Chandler Carruth authored
      the invalidation propagation logic from an SCC to a Function.
      
      I wrote the infrastructure to test this but didn't actually use it in
      the unit test where it was designed to be used. =[ My bad. Once
      I actually added it to the test case I discovered that it also hadn't
      been properly implemented, so I've implemented it. The logic in the FAM
      proxy for an SCC pass to propagate invalidation follows the same ideas
      as the FAM proxy for a Module pass, but the implementation is a bit
      different to reflect the fact that it is forwarding just for an SCC.
      
      However, implementing this correctly uncovered a surprising "bug" (it
      was conservatively correct but relatively very expensive) in how we
      handle invalidation when splitting one SCC into multiple SCCs. We did an
      eager invalidation when in reality we should be deferring invaliadtion
      for the *current* SCC to the CGSCC pass manager and just invaliating the
      newly constructed SCCs. Otherwise we end up invalidating too much too
      soon. This was exposed by the inliner test case that I've updated. Now,
      we invalidate *just* the split off '(test1_f)' SCC when doing the CG
      update, and then the inliner finishes and invalidates the '(test1_g,
      test1_h)' SCC's analyses. The first few attempts at fixing this hit
      still more bugs, but all of those are covered by existing tests. For
      example, the inliner should also preserve the FAM proxy to avoid
      unnecesasry invalidation, and this is safe because the CG update
      routines it uses handle any necessary adjustments to the FAM proxy.
      
      Finally, the unittests for the CGSCC pass manager needed a bunch of
      updates where we weren't correctly preserving the FAM proxy because it
      hadn't been fully implemented and failing to preserve it didn't matter.
      
      Note that this doesn't yet fix the current crasher due to MemSSA finding
      a stale dominator tree, but without this the fix to that crasher doesn't
      really make any sense when testing because it relies on the proxy
      behavior.
      
      llvm-svn: 307487
      bd9c2903
  23. Jun 13, 2017
  24. Jun 09, 2017
    • David Blaikie's avatar
      Inliner: Don't touch indirect calls · cb9327b0
      David Blaikie authored
      Other comments/implications are that this isn't intended behavior (nor
      perserved/reimplemented in the new inliner) & complicates fixing the
      'inlining' of trivially dead calls without consulting the cost function
      first.
      
      llvm-svn: 305052
      cb9327b0
  25. May 10, 2017
  26. Mar 22, 2017
  27. Mar 16, 2017
  28. Mar 09, 2017
    • Chandler Carruth's avatar
      [PM/Inliner] Make the new PM's inliner process call edges across an · 20e588e1
      Chandler Carruth authored
      entire SCC before iterating on newly-introduced call edges resulting
      from any inlined function bodies.
      
      This more closely matches the behavior of the old PM's inliner. While it
      wasn't really clear to me initially, this behavior is actually essential
      to the inliner behaving reasonably in its current design.
      
      Because the inliner is fundamentally a bottom-up inliner and all of its
      cost modeling is designed around that it often runs into trouble within
      an SCC where we don't have any meaningful bottom-up ordering to use. In
      addition to potentially cyclic, infinite inlining that we block with the
      inline history mechanism, it can also take seemingly simple call graph
      patterns within an SCC and turn them into *insanely* large functions by
      accidentally working top-down across the SCC without any of the
      threshold limitations that traditional top-down inliners use.
      
      Consider this diabolical monster.cpp file that Richard Smith came up
      with to help demonstrate this issue:
      ```
      template <int N> extern const char *str;
      
      void g(const char *);
      
      template <bool K, int N> void f(bool *B, bool *E) {
        if (K)
          g(str<N>);
        if (B == E)
          return;
        if (*B)
          f<true, N + 1>(B + 1, E);
        else
          f<false, N + 1>(B + 1, E);
      }
      template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); }
      template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); }
      
      extern bool *arr, *end;
      void test() { f<false, 0>(arr, end); }
      ```
      
      When compiled with '-DMAX=N' for various values of N, this will create an SCC
      with a reasonably large number of functions. Previously, the inliner would try
      to exhaust the inlining candidates in a single function before moving on. This,
      unfortunately, turns it into a top-down inliner within the SCC. Because our
      thresholds were never built for that, we will incrementally decide that it is
      always worth inlining and proceed to flatten the entire SCC into that one
      function.
      
      What's worse, we'll then proceed to the next function, and do the exact same
      thing except we'll skip the first function, and so on. And at each step, we'll
      also make some of the constant factors larger, which is awesome.
      
      The fix in this patch is the obvious one which makes the new PM's inliner use
      the same technique used by the old PM: consider all the call edges across the
      entire SCC before beginning to process call edges introduced by inlining. The
      result of this is essentially to distribute the inlining across the SCC so that
      every function incrementally grows toward the inline thresholds rather than
      allowing the inliner to grow one of the functions vastly beyond the threshold.
      The code for this is a bit awkward, but it works out OK.
      
      We could consider in the future doing something more powerful here such as
      prioritized order (via lowest cost and/or profile info) and/or a code-growth
      budget per SCC. However, both of those would require really substantial work
      both to design the system in a way that wouldn't break really useful
      abstraction decomposition properties of the current inliner and to be tuned
      across a reasonably diverse set of code and workloads. It also seems really
      risky in many ways. I have only found a single real-world file that triggers
      the bad behavior here and it is generated code that has a pretty pathological
      pattern. I'm not worried about the inliner not doing an *awesome* job here as
      long as it does *ok*. On the other hand, the cases that will be tricky to get
      right in a prioritized scheme with a budget will be more common and idiomatic
      for at least some frontends (C++ and Rust at least). So while these approaches
      are still really interesting, I'm not in a huge rush to go after them. Staying
      even closer to the existing PM's behavior, especially when this easy to do,
      seems like the right short to medium term approach.
      
      I don't really have a test case that makes sense yet... I'll try to find a
      variant of the IR produced by the monster template metaprogram that is both
      small enough to be sane and large enough to clearly show when we get this wrong
      in the future. But I'm not confident this exists. And the behavior change here
      *should* be unobservable without snooping on debug logging. So there isn't
      really much to test.
      
      The test case updates come from two incidental changes:
      1) We now visit functions in an SCC in the opposite order. I don't think there
         really is a "right" order here, so I just update the test cases.
      2) We no longer compute some analyses when an SCC has no call instructions that
         we consider for inlining.
      
      llvm-svn: 297374
      20e588e1
  29. Feb 14, 2017
    • Taewook Oh's avatar
      Do not apply redundant LastCallToStaticBonus · f22fa72e
      Taewook Oh authored
      Summary:
      As written in the comments above, LastCallToStaticBonus is already applied to
      the cost if Caller has only one user, so it is redundant to reapply the bonus
      here.
      
      If the only user is not a caller, TotalSecondaryCost will not be adjusted
      anyway because callerWillBeRemoved is false. If there's no caller at all, we
      don't need to care about TotalSecondaryCost because
      inliningPreventsSomeOuterInline is false.
      
      Reviewers: chandlerc, eraman
      
      Reviewed By: eraman
      
      Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini
      
      Differential Revision: https://reviews.llvm.org/D29169
      
      llvm-svn: 295075
      f22fa72e
  30. Feb 10, 2017
    • Chandler Carruth's avatar
      [PM/LCG] Teach the LazyCallGraph how to replace a function without · aaad9f84
      Chandler Carruth authored
      disturbing the graph or having to update edges.
      
      This is motivated by porting argument promotion to the new pass manager.
      Because of how LLVM IR Function objects work, in order to change their
      signature a new object needs to be created. This is efficient and
      straight forward in the IR but previously was very hard to implement in
      LCG. We could easily replace the function a node in the graph
      represents. The challenging part is how to handle updating the edges in
      the graph.
      
      LCG previously used an edge to a raw function to represent a node that
      had not yet been scanned for calls and references. This was the core
      of its laziness. However, that model causes this kind of update to be
      very hard:
      1) The keys to lookup an edge need to be `Function*`s that would all
         need to be updated when we update the node.
      2) There will be some unknown number of edges that haven't transitioned
         from `Function*` edges to `Node*` edges.
      
      All of this complexity isn't necessary. Instead, we can always build
      a node around any function, always pointing edges at it and always using
      it as the key to lookup an edge. To maintain the laziness, we need to
      sink the *edges* of a node into a secondary object and explicitly model
      transitioning a node from empty to populated by scanning the function.
      This design seems much cleaner in a number of ways, but importantly
      there is now exactly *one* place where the `Function*` has to be
      updated!
      
      Some other cleanups that fall out of this include having something to
      model the *entry* edges more accurately. Rather than hand rolling parts
      of the node in the graph itself, we have an explicit `EdgeSequence`
      object that gives us exactly the functionality needed. We also have
      a consistent place to define the edge iterators and can use them for
      both the entry edges and the internal edges of the graph.
      
      The API used to model the separation between a node and its edges is
      intentionally very thin as most clients are expected to deal with nodes
      that have populated edges. We model this exactly as an optional does
      with an additional method to populate the edges when that is
      a reasonable thing for a client to do. This is based on API design
      suggestions from Richard Smith and David Blaikie, credit goes to them
      for helping pick how to model this without it being either too explicit
      or too implicit.
      
      The patch is somewhat noisy due to shifting around iterator types and
      new syntax for walking the edges of a node, but most of the
      functionality change is in the `Edge`, `EdgeSequence`, and `Node` types.
      
      Differential Revision: https://reviews.llvm.org/D29577
      
      llvm-svn: 294653
      aaad9f84
    • Peter Collingbourne's avatar
      De-duplicate some code for creating an AARGetter suitable for the legacy PM. · cea1e4e7
      Peter Collingbourne authored
      I'm about to use this in a couple more places.
      
      Differential Revision: https://reviews.llvm.org/D29793
      
      llvm-svn: 294648
      cea1e4e7
Loading