Skip to content
  1. May 08, 2018
  2. Mar 21, 2018
    • David Blaikie's avatar
      Fix a couple of layering violations in Transforms · 2be39228
      David Blaikie authored
      Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering.
      
      Transforms depends on Transforms/Utils, not the other way around. So
      remove the header and the "createStripGCRelocatesPass" function
      declaration (& definition) that is unused and motivated this dependency.
      
      Move Transforms/Utils/Local.h into Analysis because it's used by
      Analysis/MemoryBuiltins.cpp.
      
      llvm-svn: 328165
      2be39228
  3. Oct 19, 2017
  4. Oct 11, 2017
  5. Oct 10, 2017
  6. Sep 28, 2017
    • Sanjoy Das's avatar
      Use a BumpPtrAllocator for Loop objects · def1729d
      Sanjoy Das authored
      Summary:
      And now that we no longer have to explicitly free() the Loop instances, we can
      (with more ease) use the destructor of LoopBase to do what LoopBase::clear() was
      doing.
      
      Reviewers: chandlerc
      
      Subscribers: mehdi_amini, mcrosier, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D38201
      
      llvm-svn: 314375
      def1729d
  7. Sep 20, 2017
    • Adam Nemet's avatar
      Allow ORE.emit to take a closure to delay building the remark object · 15fccf00
      Adam Nemet authored
      In the lambda we are now returning the remark by value so we need to preserve
      its type in the insertion operator.  This requires making the insertion
      operator generic.
      
      I've also converted a few cases to use the new API.  It seems to work pretty
      well.  See the LoopUnroller for a slightly more interesting case.
      
      llvm-svn: 313691
      15fccf00
  8. Aug 21, 2017
  9. Aug 20, 2017
  10. Aug 02, 2017
    • Chandler Carruth's avatar
      [PM] Fix a bug where through CGSCC iteration we can get · 95055d8f
      Chandler Carruth authored
      infinite-inlining across multiple runs of the inliner by keeping a tiny
      history of internal-to-SCC inlining decisions.
      
      This is still a bit gross, but I don't yet have any fundamentally better
      ideas and numerous people are blocked on this to use new PM and ThinLTO
      together.
      
      The core of the idea is to detect when we are about to do an inline that
      has a chance of re-splitting an SCC which we have split before with
      a similar inlining step. That is a critical component in the inlining
      forming a cycle and so far detects all of the various cyclic patterns
      I can come up with as well as the original real-world test case (which
      comes from a ThinLTO build of libunwind).
      
      I've added some tests that I think really demonstrate what is going on
      here. They are essentially state machines that march the inliner through
      various steps of a cycle and check that we stop when the cycle is closed
      and that we actually did do inlining to form that cycle.
      
      A lot of thanks go to Eric Christopher and Sanjoy Das for the help
      understanding this issue and improving the test cases.
      
      The biggest "yuck" here is the layering issue -- the CGSCC pass manager
      is providing somewhat magical state to the inliner for it to use to make
      itself converge. This isn't great, but I don't honestly have a lot of
      better ideas yet and at least seems nicely isolated.
      
      I have tested this patch, and it doesn't block *any* inlining on the
      entire LLVM test suite and SPEC, so it seems sufficiently narrowly
      targeted to the issue at hand.
      
      We have come up with hypothetical issues that this patch doesn't cover,
      but so far none of them are practical and we don't have a viable
      solution yet that covers the hypothetical stuff, so proceeding here in
      the interim. Definitely an area that we will be back and revisiting in
      the future.
      
      Differential Revision: https://reviews.llvm.org/D36188
      
      llvm-svn: 309784
      95055d8f
  11. Jul 19, 2017
    • Chandler Carruth's avatar
      [PM/LCG] Follow-up fix to r308088 to handle deletion of library · 06a86301
      Chandler Carruth authored
      functions.
      
      In the prior commit, we provide ordering to the LCG between functions
      and library function definitions that they might begin to call through
      transformations. But we still would delete these library functions from
      the call graph if they became dead during inlining.
      
      While this immediately crashed, it also exposed a loss of information.
      We shouldn't remove definitions of library functions that can still
      usefully participate in the LCG-powered CGSCC optimization process. If
      new call edges are formed, we want to have definitions to be called.
      
      We can still remove these functions if truly dead using global-dce, etc,
      but removing them during the CGSCC walk is premature.
      
      This fixes a crash in the new PM when optimizing some unusual libraries
      that end up with "internal" lib functions such as the code in the "R"
      language's libraries.
      
      llvm-svn: 308417
      06a86301
  12. Jul 09, 2017
    • Chandler Carruth's avatar
      [PM] Finish implementing and fix a chain of bugs uncovered by testing · bd9c2903
      Chandler Carruth authored
      the invalidation propagation logic from an SCC to a Function.
      
      I wrote the infrastructure to test this but didn't actually use it in
      the unit test where it was designed to be used. =[ My bad. Once
      I actually added it to the test case I discovered that it also hadn't
      been properly implemented, so I've implemented it. The logic in the FAM
      proxy for an SCC pass to propagate invalidation follows the same ideas
      as the FAM proxy for a Module pass, but the implementation is a bit
      different to reflect the fact that it is forwarding just for an SCC.
      
      However, implementing this correctly uncovered a surprising "bug" (it
      was conservatively correct but relatively very expensive) in how we
      handle invalidation when splitting one SCC into multiple SCCs. We did an
      eager invalidation when in reality we should be deferring invaliadtion
      for the *current* SCC to the CGSCC pass manager and just invaliating the
      newly constructed SCCs. Otherwise we end up invalidating too much too
      soon. This was exposed by the inliner test case that I've updated. Now,
      we invalidate *just* the split off '(test1_f)' SCC when doing the CG
      update, and then the inliner finishes and invalidates the '(test1_g,
      test1_h)' SCC's analyses. The first few attempts at fixing this hit
      still more bugs, but all of those are covered by existing tests. For
      example, the inliner should also preserve the FAM proxy to avoid
      unnecesasry invalidation, and this is safe because the CG update
      routines it uses handle any necessary adjustments to the FAM proxy.
      
      Finally, the unittests for the CGSCC pass manager needed a bunch of
      updates where we weren't correctly preserving the FAM proxy because it
      hadn't been fully implemented and failing to preserve it didn't matter.
      
      Note that this doesn't yet fix the current crasher due to MemSSA finding
      a stale dominator tree, but without this the fix to that crasher doesn't
      really make any sense when testing because it relies on the proxy
      behavior.
      
      llvm-svn: 307487
      bd9c2903
  13. Jun 13, 2017
  14. Jun 09, 2017
    • David Blaikie's avatar
      Inliner: Don't touch indirect calls · cb9327b0
      David Blaikie authored
      Other comments/implications are that this isn't intended behavior (nor
      perserved/reimplemented in the new inliner) & complicates fixing the
      'inlining' of trivially dead calls without consulting the cost function
      first.
      
      llvm-svn: 305052
      cb9327b0
  15. May 10, 2017
  16. Mar 22, 2017
  17. Mar 16, 2017
  18. Mar 09, 2017
    • Chandler Carruth's avatar
      [PM/Inliner] Make the new PM's inliner process call edges across an · 20e588e1
      Chandler Carruth authored
      entire SCC before iterating on newly-introduced call edges resulting
      from any inlined function bodies.
      
      This more closely matches the behavior of the old PM's inliner. While it
      wasn't really clear to me initially, this behavior is actually essential
      to the inliner behaving reasonably in its current design.
      
      Because the inliner is fundamentally a bottom-up inliner and all of its
      cost modeling is designed around that it often runs into trouble within
      an SCC where we don't have any meaningful bottom-up ordering to use. In
      addition to potentially cyclic, infinite inlining that we block with the
      inline history mechanism, it can also take seemingly simple call graph
      patterns within an SCC and turn them into *insanely* large functions by
      accidentally working top-down across the SCC without any of the
      threshold limitations that traditional top-down inliners use.
      
      Consider this diabolical monster.cpp file that Richard Smith came up
      with to help demonstrate this issue:
      ```
      template <int N> extern const char *str;
      
      void g(const char *);
      
      template <bool K, int N> void f(bool *B, bool *E) {
        if (K)
          g(str<N>);
        if (B == E)
          return;
        if (*B)
          f<true, N + 1>(B + 1, E);
        else
          f<false, N + 1>(B + 1, E);
      }
      template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); }
      template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); }
      
      extern bool *arr, *end;
      void test() { f<false, 0>(arr, end); }
      ```
      
      When compiled with '-DMAX=N' for various values of N, this will create an SCC
      with a reasonably large number of functions. Previously, the inliner would try
      to exhaust the inlining candidates in a single function before moving on. This,
      unfortunately, turns it into a top-down inliner within the SCC. Because our
      thresholds were never built for that, we will incrementally decide that it is
      always worth inlining and proceed to flatten the entire SCC into that one
      function.
      
      What's worse, we'll then proceed to the next function, and do the exact same
      thing except we'll skip the first function, and so on. And at each step, we'll
      also make some of the constant factors larger, which is awesome.
      
      The fix in this patch is the obvious one which makes the new PM's inliner use
      the same technique used by the old PM: consider all the call edges across the
      entire SCC before beginning to process call edges introduced by inlining. The
      result of this is essentially to distribute the inlining across the SCC so that
      every function incrementally grows toward the inline thresholds rather than
      allowing the inliner to grow one of the functions vastly beyond the threshold.
      The code for this is a bit awkward, but it works out OK.
      
      We could consider in the future doing something more powerful here such as
      prioritized order (via lowest cost and/or profile info) and/or a code-growth
      budget per SCC. However, both of those would require really substantial work
      both to design the system in a way that wouldn't break really useful
      abstraction decomposition properties of the current inliner and to be tuned
      across a reasonably diverse set of code and workloads. It also seems really
      risky in many ways. I have only found a single real-world file that triggers
      the bad behavior here and it is generated code that has a pretty pathological
      pattern. I'm not worried about the inliner not doing an *awesome* job here as
      long as it does *ok*. On the other hand, the cases that will be tricky to get
      right in a prioritized scheme with a budget will be more common and idiomatic
      for at least some frontends (C++ and Rust at least). So while these approaches
      are still really interesting, I'm not in a huge rush to go after them. Staying
      even closer to the existing PM's behavior, especially when this easy to do,
      seems like the right short to medium term approach.
      
      I don't really have a test case that makes sense yet... I'll try to find a
      variant of the IR produced by the monster template metaprogram that is both
      small enough to be sane and large enough to clearly show when we get this wrong
      in the future. But I'm not confident this exists. And the behavior change here
      *should* be unobservable without snooping on debug logging. So there isn't
      really much to test.
      
      The test case updates come from two incidental changes:
      1) We now visit functions in an SCC in the opposite order. I don't think there
         really is a "right" order here, so I just update the test cases.
      2) We no longer compute some analyses when an SCC has no call instructions that
         we consider for inlining.
      
      llvm-svn: 297374
      20e588e1
  19. Feb 14, 2017
    • Taewook Oh's avatar
      Do not apply redundant LastCallToStaticBonus · f22fa72e
      Taewook Oh authored
      Summary:
      As written in the comments above, LastCallToStaticBonus is already applied to
      the cost if Caller has only one user, so it is redundant to reapply the bonus
      here.
      
      If the only user is not a caller, TotalSecondaryCost will not be adjusted
      anyway because callerWillBeRemoved is false. If there's no caller at all, we
      don't need to care about TotalSecondaryCost because
      inliningPreventsSomeOuterInline is false.
      
      Reviewers: chandlerc, eraman
      
      Reviewed By: eraman
      
      Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini
      
      Differential Revision: https://reviews.llvm.org/D29169
      
      llvm-svn: 295075
      f22fa72e
  20. Feb 10, 2017
    • Chandler Carruth's avatar
      [PM/LCG] Teach the LazyCallGraph how to replace a function without · aaad9f84
      Chandler Carruth authored
      disturbing the graph or having to update edges.
      
      This is motivated by porting argument promotion to the new pass manager.
      Because of how LLVM IR Function objects work, in order to change their
      signature a new object needs to be created. This is efficient and
      straight forward in the IR but previously was very hard to implement in
      LCG. We could easily replace the function a node in the graph
      represents. The challenging part is how to handle updating the edges in
      the graph.
      
      LCG previously used an edge to a raw function to represent a node that
      had not yet been scanned for calls and references. This was the core
      of its laziness. However, that model causes this kind of update to be
      very hard:
      1) The keys to lookup an edge need to be `Function*`s that would all
         need to be updated when we update the node.
      2) There will be some unknown number of edges that haven't transitioned
         from `Function*` edges to `Node*` edges.
      
      All of this complexity isn't necessary. Instead, we can always build
      a node around any function, always pointing edges at it and always using
      it as the key to lookup an edge. To maintain the laziness, we need to
      sink the *edges* of a node into a secondary object and explicitly model
      transitioning a node from empty to populated by scanning the function.
      This design seems much cleaner in a number of ways, but importantly
      there is now exactly *one* place where the `Function*` has to be
      updated!
      
      Some other cleanups that fall out of this include having something to
      model the *entry* edges more accurately. Rather than hand rolling parts
      of the node in the graph itself, we have an explicit `EdgeSequence`
      object that gives us exactly the functionality needed. We also have
      a consistent place to define the edge iterators and can use them for
      both the entry edges and the internal edges of the graph.
      
      The API used to model the separation between a node and its edges is
      intentionally very thin as most clients are expected to deal with nodes
      that have populated edges. We model this exactly as an optional does
      with an additional method to populate the edges when that is
      a reasonable thing for a client to do. This is based on API design
      suggestions from Richard Smith and David Blaikie, credit goes to them
      for helping pick how to model this without it being either too explicit
      or too implicit.
      
      The patch is somewhat noisy due to shifting around iterator types and
      new syntax for walking the edges of a node, but most of the
      functionality change is in the `Edge`, `EdgeSequence`, and `Node` types.
      
      Differential Revision: https://reviews.llvm.org/D29577
      
      llvm-svn: 294653
      aaad9f84
    • Peter Collingbourne's avatar
      De-duplicate some code for creating an AARGetter suitable for the legacy PM. · cea1e4e7
      Peter Collingbourne authored
      I'm about to use this in a couple more places.
      
      Differential Revision: https://reviews.llvm.org/D29793
      
      llvm-svn: 294648
      cea1e4e7
  21. Jan 30, 2017
  22. Jan 24, 2017
    • Chandler Carruth's avatar
      [PH] Replace uses of AssertingVH from members of analysis results with · 6acdca78
      Chandler Carruth authored
      a lazy-asserting PoisoningVH.
      
      AssertVH is fundamentally incompatible with cache-invalidation of
      analysis results. The invaliadtion happens after the AssertingVH has
      already fired. Instead, use a PoisoningVH that will assert if the
      dangling handle is ever used rather than merely be assigned or
      destroyed.
      
      This patch also removes all of the (numerous) doomed attempts to work
      around this fundamental incompatibility. It is a pretty significant
      simplification IMO.
      
      The most interesting change is in the Inliner where we still do some
      clearing because we don't want to rely on the coarse grained
      invalidation strategy of the containing pass manager. However, I prefer
      the approach that contains this logic to the cleanup phase of the
      Inliner, and I think we could enhance the CGSCC analysis management
      layer to make this even better in the future if desired.
      
      The rest is straight cleanup.
      
      I've also added a test for one of the harder cases to work around: when
      a *module analysis* contains many AssertingVHes pointing at functions.
      
      Differential Revision: https://reviews.llvm.org/D29006
      
      llvm-svn: 292928
      6acdca78
  23. Jan 23, 2017
  24. Jan 22, 2017
    • Chandler Carruth's avatar
      [PM] Fix a really nasty bug introduced when adding PGO support to the · b698d596
      Chandler Carruth authored
      new PM's inliner.
      
      The bug happens when we refine an SCC after having computed a proxy for
      the FunctionAnalysisManager, and then proceed to compute fresh analyses
      for functions in the *new* SCC using the manager provided by the old
      SCC's proxy. *And* when we manage to mutate a function in this new SCC
      in a way that invalidates those analyses. This can be... challenging to
      reproduce.
      
      I've managed to contrive a set of functions that trigger this and added
      a test case, but it is a bit brittle. I've directly checked that the
      passes run in the expected ways to help avoid the test just becoming
      silently irrelevant.
      
      This gets the new PM back to passing the LLVM test suite after the PGO
      improvements landed.
      
      llvm-svn: 292757
      b698d596
    • Chandler Carruth's avatar
      [PM] Add some debug logging to the new PM inliner to make it easier to · d4be9f4b
      Chandler Carruth authored
      trace its behavior.
      
      llvm-svn: 292756
      d4be9f4b
  25. Jan 20, 2017
    • Easwaran Raman's avatar
      Improve PGO support for the new inliner · 12585b01
      Easwaran Raman authored
      This adds the following to the new PM based inliner in PGO mode:
      
      * Use block frequency analysis to derive callsite's profile count and use
      that to adjust thresholds of hot and cold callsites.
      
      * Incrementally update the BFI of the caller after a callee gets inlined
      into it. This incremental update is only within an invocation of the run
      method - BFI is not preserved across calls to run.
      Update the function entry count of the callee after inlining it into a
      caller.
      
      * I've tuned the thresholds for the hot and cold callsites using a hacked
      up version of the old inliner that explicitly computes BFI on a set of
      internal benchmarks and spec. Once the new PM based pipeline stabilizes
      (IIRC Chandler mentioned there are known issues) I'll benchmark this
      again and adjust the thresholds if required.
      Inliner PGO support.
      
      Differential revision: https://reviews.llvm.org/D28331
      
      llvm-svn: 292666
      12585b01
  26. Dec 28, 2016
    • Chandler Carruth's avatar
      [PM] Teach the inliner's call graph update to handle inserting new edges · 9900d18b
      Chandler Carruth authored
      when they are call edges at the leaf but may (transitively) be reached
      via ref edges.
      
      It turns out there is a simple rule: insert everything as a ref edge
      which is a safe conservative default. Then we let the existing update
      logic handle promoting some of those to call edges.
      
      Note that it would be fairly cheap to make these call edges right away
      if that is desirable by testing whether there is some existing call path
      from the source to the target. It just seemed like slightly more
      complexity in this code path that isn't strictly necessary. If anyone
      feels strongly about handling this differently I'm happy to change it.
      
      llvm-svn: 290649
      9900d18b
  27. Dec 27, 2016
    • Chandler Carruth's avatar
      [PM] Add one of the features left out of the initial inliner patch: · 141bf5d1
      Chandler Carruth authored
      skipping indirectly recursive inline chains.
      
      To do this, we implicitly build an inline stack for each callsite and
      check prior to inlining that doing so would not form a cycle. This uses
      the exact same technique and even shares some code with the legacy PM
      inliner.
      
      This solution remains deeply unsatisfying to me because it means we
      cannot actually iterate the inliner externally. Doing so would not be
      able to easily detect and avoid such cycles. Some day I would very much
      like to have a solution that works without this internal state to detect
      cycles, but this is not that day.
      
      llvm-svn: 290590
      141bf5d1
    • Chandler Carruth's avatar
      [PM] Teach the inliner in the new PM to merge attributes after inlining. · 03130d98
      Chandler Carruth authored
      Also enable the new PM in the attributes test case which caught this
      issue.
      
      llvm-svn: 290572
      03130d98
    • Chandler Carruth's avatar
      [PM] Teach the always inliner in the new pass manager to support · 6e9bb7e0
      Chandler Carruth authored
      removing fully-dead comdats without removing dead entries in comdats
      with live members.
      
      This factors the core logic out of the current inliner's internals to
      a reusable utility and leverages that in both places. The factored out
      code should also be (minorly) more efficient in cases where we have very
      few dead functions or dead comdats to consider.
      
      I've added a test case to cover this behavior of the always inliner.
      This is the last significant bug in the new PM's always inliner I've
      found (so far).
      
      llvm-svn: 290557
      6e9bb7e0
  28. Dec 22, 2016
  29. Dec 20, 2016
    • Chandler Carruth's avatar
      [PM] Provide an initial, minimal port of the inliner to the new pass manager. · 1d963114
      Chandler Carruth authored
      This doesn't implement *every* feature of the existing inliner, but
      tries to implement the most important ones for building a functional
      optimization pipeline and beginning to sort out bugs, regressions, and
      other problems.
      
      Notable, but intentional omissions:
      - No alloca merging support. Why? Because it isn't clear we want to do
        this at all. Active discussion and investigation is going on to remove
        it, so for simplicity I omitted it.
      - No support for trying to iterate on "internally" devirtualized calls.
        Why? Because it adds what I suspect is inappropriate coupling for
        little or no benefit. We will have an outer iteration system that
        tracks devirtualization including that from function passes and
        iterates already. We should improve that rather than approximate it
        here.
      - Optimization remarks. Why? Purely to make the patch smaller, no other
        reason at all.
      
      The last one I'll probably work on almost immediately. But I wanted to
      skip it in the initial patch to try to focus the change as much as
      possible as there is already a lot of code moving around and both of
      these *could* be skipped without really disrupting the core logic.
      
      A summary of the different things happening here:
      
      1) Adding the usual new PM class and rigging.
      
      2) Fixing minor underlying assumptions in the inline cost analysis or
         inline logic that don't generally hold in the new PM world.
      
      3) Adding the core pass logic which is in essence a loop over the calls
         in the nodes in the call graph. This is a bit duplicated from the old
         inliner, but only a handful of lines could realistically be shared.
         (I tried at first, and it really didn't help anything.) All told,
         this is only about 100 lines of code, and most of that is the
         mechanics of wiring up analyses from the new PM world.
      
      4) Updating the LazyCallGraph (in the new PM) based on the *newly
         inlined* calls and references. This is very minimal because we cannot
         form cycles.
      
      5) When inlining removes the last use of a function, eagerly nuking the
         body of the function so that any "one use remaining" inline cost
         heuristics are immediately refined, and queuing these functions to be
         completely deleted once inlining is complete and the call graph
         updated to reflect that they have become dead.
      
      6) After all the inlining for a particular function, updating the
         LazyCallGraph and the CGSCC pass manager to reflect the
         function-local simplifications that are done immediately and
         internally by the inline utilties. These are the exact same
         fundamental set of CG updates done by arbitrary function passes.
      
      7) Adding a bunch of test cases to specifically target CGSCC and other
         subtle aspects in the new PM world.
      
      Many thanks to the careful review from Easwaran and Sanjoy and others!
      
      Differential Revision: https://reviews.llvm.org/D24226
      
      llvm-svn: 290161
      1d963114
  30. Dec 19, 2016
  31. Dec 15, 2016
    • Hal Finkel's avatar
      Remove the AssumptionCache · 3ca4a6bc
      Hal Finkel authored
      After r289755, the AssumptionCache is no longer needed. Variables affected by
      assumptions are now found by using the new operand-bundle-based scheme. This
      new scheme is more computationally efficient, and also we need much less
      code...
      
      llvm-svn: 289756
      3ca4a6bc
  32. Nov 20, 2016
Loading