Skip to content
  1. Nov 19, 2014
  2. Oct 08, 2014
  3. Sep 07, 2014
    • Hal Finkel's avatar
      Add an Assumption-Tracking Pass · 74c2f355
      Hal Finkel authored
      This adds an immutable pass, AssumptionTracker, which keeps a cache of
      @llvm.assume call instructions within a module. It uses callback value handles
      to keep stale functions and intrinsics out of the map, and it relies on any
      code that creates new @llvm.assume calls to notify it of the new instructions.
      The benefit is that code needing to find @llvm.assume intrinsics can do so
      directly, without scanning the function, thus allowing the cost of @llvm.assume
      handling to be negligible when none are present.
      
      The current design is intended to be lightweight. We don't keep track of
      anything until we need a list of assumptions in some function. The first time
      this happens, we scan the function. After that, we add/remove @llvm.assume
      calls from the cache in response to registration calls and ValueHandle
      callbacks.
      
      There are no new direct test cases for this pass, but because it calls it
      validation function upon module finalization, we'll pick up detectable
      inconsistencies from the other tests that touch @llvm.assume calls.
      
      This pass will be used by follow-up commits that make use of @llvm.assume.
      
      llvm-svn: 217334
      74c2f355
  4. Sep 01, 2014
    • Hal Finkel's avatar
      Feed AA to the inliner and use AA->getModRefBehavior in AddAliasScopeMetadata · 0c083024
      Hal Finkel authored
      This feeds AA through the IFI structure into the inliner so that
      AddAliasScopeMetadata can use AA->getModRefBehavior to figure out which
      functions only access their arguments (instead of just hard-coding some
      knowledge of memory intrinsics). Most of the information is only available from
      BasicAA; this is important for preserving alias scoping information for
      target-specific intrinsics when doing the noalias parameter attribute to
      metadata conversion.
      
      llvm-svn: 216866
      0c083024
  5. Jul 30, 2014
  6. May 22, 2014
    • Diego Novillo's avatar
      Add support for missed and analysis optimization remarks. · 7f8af8bf
      Diego Novillo authored
      Summary:
      This adds two new diagnostics: -pass-remarks-missed and
      -pass-remarks-analysis. They take the same values as -pass-remarks but
      are intended to be triggered in different contexts.
      
      -pass-remarks-missed is used by LLVMContext::emitOptimizationRemarkMissed,
      which passes call when they tried to apply a transformation but
      couldn't.
      
      -pass-remarks-analysis is used by LLVMContext::emitOptimizationRemarkAnalysis,
      which passes call when they want to inform the user about analysis
      results.
      
      The patch also:
      
      1- Adds support in the inliner for the two new remarks and a
         test case.
      
      2- Moves emitOptimizationRemark* functions to the llvm namespace.
      
      3- Adds an LLVMContext argument instead of making them member functions
         of LLVMContext.
      
      Reviewers: qcolombet
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D3682
      
      llvm-svn: 209442
      7f8af8bf
  7. Apr 25, 2014
  8. Apr 22, 2014
    • Chandler Carruth's avatar
      [Modules] Fix potential ODR violations by sinking the DEBUG_TYPE · 964daaaf
      Chandler Carruth authored
      definition below all of the header #include lines, lib/Transforms/...
      edition.
      
      This one is tricky for two reasons. We again have a couple of passes
      that define something else before the includes as well. I've sunk their
      name macros with the DEBUG_TYPE.
      
      Also, InstCombine contains headers that need DEBUG_TYPE, so now those
      headers #define and #undef DEBUG_TYPE around their code, leaving them
      well formed modular headers. Fixing these headers was a large motivation
      for all of these changes, as "leaky" macros of this form are hard on the
      modules implementation.
      
      llvm-svn: 206844
      964daaaf
  9. Apr 17, 2014
    • NAKAMURA Takumi's avatar
      Inliner::OptimizationRemark: Fix crash in... · cd1fc4bc
      NAKAMURA Takumi authored
      Inliner::OptimizationRemark: Fix crash in clang/test/Frontend/optimization-remark.c on some hosts, including --vg.
      
      DebugLoc in Callsite would not live after Inliner. It should be copied before Inliner.
      
      llvm-svn: 206459
      cd1fc4bc
  10. Apr 08, 2014
    • Diego Novillo's avatar
      Add support for optimization reports. · a9298b22
      Diego Novillo authored
      Summary:
      This patch adds backend support for -Rpass=, which indicates the name
      of the optimization pass that should emit remarks stating when it
      made a transformation to the code.
      
      Pass names are taken from their DEBUG_NAME definitions.
      
      When emitting an optimization report diagnostic, the lack of debug
      information causes the diagnostic to use "<unknown>:0:0" as the
      location string.
      
      This is the back end counterpart for
      
      http://llvm-reviews.chandlerc.com/D3226
      
      Reviewers: qcolombet
      
      CC: llvm-commits
      
      Differential Revision: http://llvm-reviews.chandlerc.com/D3227
      
      llvm-svn: 205774
      a9298b22
  11. Mar 09, 2014
    • Chandler Carruth's avatar
      [C++11] Add range based accessors for the Use-Def chain of a Value. · cdf47884
      Chandler Carruth authored
      This requires a number of steps.
      1) Move value_use_iterator into the Value class as an implementation
         detail
      2) Change it to actually be a *Use* iterator rather than a *User*
         iterator.
      3) Add an adaptor which is a User iterator that always looks through the
         Use to the User.
      4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
      5) Add the range adaptors as Value::uses() and Value::users().
      6) Update *all* of the callers to correctly distinguish between whether
         they wanted a use_iterator (and to explicitly dig out the User when
         needed), or a user_iterator which makes the Use itself totally
         opaque.
      
      Because #6 requires churning essentially everything that walked the
      Use-Def chains, I went ahead and added all of the range adaptors and
      switched them to range-based loops where appropriate. Also because the
      renaming requires at least churning every line of code, it didn't make
      any sense to split these up into multiple commits -- all of which would
      touch all of the same lies of code.
      
      The result is still not quite optimal. The Value::use_iterator is a nice
      regular iterator, but Value::user_iterator is an iterator over User*s
      rather than over the User objects themselves. As a consequence, it fits
      a bit awkwardly into the range-based world and it has the weird
      extra-dereferencing 'operator->' that so many of our iterators have.
      I think this could be fixed by providing something which transforms
      a range of T&s into a range of T*s, but that *can* be separated into
      another patch, and it isn't yet 100% clear whether this is the right
      move.
      
      However, this change gets us most of the benefit and cleans up
      a substantial amount of code around Use and User. =]
      
      llvm-svn: 203364
      cdf47884
  12. Mar 04, 2014
  13. Feb 25, 2014
  14. Feb 21, 2014
  15. Feb 06, 2014
    • Manman Ren's avatar
      Set default of inlinecold-threshold to 225. · d4612449
      Manman Ren authored
      225 is the default value of inline-threshold. This change will make sure
      we have the same inlining behavior as prior to r200886.
      
      As Chandler points out, even though we don't have code in our testing
      suite that uses cold attribute, there are larger applications that do
      use cold attribute.
      
      r200886 + this commit intend to keep the same behavior as prior to r200886.
      We can later on tune the inlinecold-threshold.
      
      The main purpose of r200886 is to help performance of instrumentation based
      PGO before we actually hook up inliner with analysis passes such as BPI and BFI.
      For instrumentation based PGO, we try to increase inlining of hot functions and
      reduce inlining of cold functions by setting inlinecold-threshold.
      
      Another option suggested by Chandler is to use a boolean flag that controls
      if we should use OptSizeThreshold for cold functions. The default value
      of the boolean flag should not change the current behavior. But it gives us
      less freedom in controlling inlining of cold functions.
      
      llvm-svn: 200898
      d4612449
  16. Feb 05, 2014
  17. Nov 26, 2013
    • Chandler Carruth's avatar
      [PM] Split the CallGraph out from the ModulePass which creates the · 6378cf53
      Chandler Carruth authored
      CallGraph.
      
      This makes the CallGraph a totally generic analysis object that is the
      container for the graph data structure and the primary interface for
      querying and manipulating it. The pass logic is separated into its own
      class. For compatibility reasons, the pass provides wrapper methods for
      most of the methods on CallGraph -- they all just forward.
      
      This will allow the new pass manager infrastructure to provide its own
      analysis pass that constructs the same CallGraph object and makes it
      available. The idea is that in the new pass manager, the analysis pass's
      'run' method returns a concrete analysis 'result'. Here, that result is
      a 'CallGraph'. The 'run' method will typically do only minimal work,
      deferring much of the work into the implementation of the result object
      in order to be lazy about computing things, but when (like DomTree)
      there is *some* up-front computation, the analysis does it prior to
      handing the result back to the querying pass.
      
      I know some of this is fairly ugly. I'm happy to change it around if
      folks can suggest a cleaner interim state, but there is going to be some
      amount of unavoidable ugliness during the transition period. The good
      thing is that this is very limited and will naturally go away when the
      old pass infrastructure goes away. It won't hang around to bother us
      later.
      
      Next up is the initial new-PM-style call graph analysis. =]
      
      llvm-svn: 195722
      6378cf53
  18. Jul 17, 2013
  19. Jul 16, 2013
    • Hal Finkel's avatar
      When the inliner merges allocas, it must keep the larger alignment · 9caa8f7b
      Hal Finkel authored
      For safety, the inliner cannot decrease the allignment on an alloca when
      merging it with another.
      
      I've included two variants of the test case for this: one with DataLayout
      available, and one without. When DataLayout is not available, if only one of
      the allocas uses the default alignment (getAlignment() == 0), then they cannot
      be safely merged.
      
      llvm-svn: 186425
      9caa8f7b
  20. Jan 23, 2013
    • Bill Wendling's avatar
      Add the IR attribute 'sspstrong'. · d154e283
      Bill Wendling authored
      SSPStrong applies a heuristic to insert stack protectors in these situations:
      
      * A Protector is required for functions which contain an array, regardless of
        type or length.
      
      * A Protector is required for functions which contain a structure/union which
        contains an array, regardless of type or length.  Note, there is no limit to
        the depth of nesting.
      
      * A protector is required when the address of a local variable (i.e., stack
        based variable) is exposed. (E.g., such as through a local whose address is
        taken as part of the RHS of an assignment or a local whose address is taken as
        part of a function argument.)
      
      This patch implements the SSPString attribute to be equivalent to
      SSPRequired. This will change in a subsequent patch.
      
      llvm-svn: 173230
      d154e283
  21. Jan 02, 2013
    • Chandler Carruth's avatar
      Move all of the header files which are involved in modelling the LLVM IR · 9fb823bb
      Chandler Carruth authored
      into their new header subdirectory: include/llvm/IR. This matches the
      directory structure of lib, and begins to correct a long standing point
      of file layout clutter in LLVM.
      
      There are still more header files to move here, but I wanted to handle
      them in separate commits to make tracking what files make sense at each
      layer easier.
      
      The only really questionable files here are the target intrinsic
      tablegen files. But that's a battle I'd rather not fight today.
      
      I've updated both CMake and Makefile build systems (I think, and my
      tests think, but I may have missed something).
      
      I've also re-sorted the includes throughout the project. I'll be
      committing updates to Clang, DragonEgg, and Polly momentarily.
      
      llvm-svn: 171366
      9fb823bb
  22. Dec 30, 2012
  23. Dec 27, 2012
  24. Dec 19, 2012
  25. Dec 13, 2012
    • Quentin Colombet's avatar
      Take into account minimize size attribute in the inliner. · c0dba203
      Quentin Colombet authored
      Better controls the inlining of functions when the caller function has MinSize attribute.
      Basically, when the caller function has this attribute, we do not "force" the inlining
      of callee functions carrying the InlineHint attribute (i.e., functions defined with
      inline keyword)
      
      llvm-svn: 170065
      c0dba203
  26. Dec 03, 2012
    • Chandler Carruth's avatar
      Use the new script to sort the includes of every file under lib. · ed0881b2
      Chandler Carruth authored
      Sooooo many of these had incorrect or strange main module includes.
      I have manually inspected all of these, and fixed the main module
      include to be the nearest plausible thing I could find. If you own or
      care about any of these source files, I encourage you to take some time
      and check that these edits were sensible. I can't have broken anything
      (I strictly added headers, and reordered them, never removed), but they
      may not be the headers you'd really like to identify as containing the
      API being implemented.
      
      Many forward declarations and missing includes were added to a header
      files to allow them to parse cleanly when included first. The main
      module rule does in fact have its merits. =]
      
      llvm-svn: 169131
      ed0881b2
  27. Oct 10, 2012
  28. Oct 09, 2012
  29. Oct 08, 2012
  30. Sep 26, 2012
  31. Sep 13, 2012
  32. Aug 29, 2012
    • Benjamin Kramer's avatar
      Make MemoryBuiltins aware of TargetLibraryInfo. · 8bcc9711
      Benjamin Kramer authored
      This disables malloc-specific optimization when -fno-builtin (or -ffreestanding)
      is specified. This has been a problem for a long time but became more severe
      with the recent memory builtin improvements.
      
      Since the memory builtin functions are used everywhere, this required passing
      TLI in many places. This means that functions that now have an optional TLI
      argument, like RecursivelyDeleteTriviallyDeadFunctions, won't remove dead
      mallocs anymore if the TLI argument is missing. I've updated most passes to do
      the right thing.
      
      Fixes PR13694 and probably others.
      
      llvm-svn: 162841
      8bcc9711
  33. Jun 02, 2012
  34. May 23, 2012
  35. Apr 11, 2012
  36. Apr 01, 2012
  37. Mar 31, 2012
    • Chandler Carruth's avatar
      Remove a bunch of empty, dead, and no-op methods from all of these · edd2826f
      Chandler Carruth authored
      interfaces. These methods were used in the old inline cost system where
      there was a persistent cache that had to be updated, invalidated, and
      cleared. We're now doing more direct computations that don't require
      this intricate dance. Even if we resume some level of caching, it would
      almost certainly have a simpler and more narrow interface than this.
      
      llvm-svn: 153813
      edd2826f
    • Chandler Carruth's avatar
      Initial commit for the rewrite of the inline cost analysis to operate · 0539c071
      Chandler Carruth authored
      on a per-callsite walk of the called function's instructions, in
      breadth-first order over the potentially reachable set of basic blocks.
      
      This is a major shift in how inline cost analysis works to improve the
      accuracy and rationality of inlining decisions. A brief outline of the
      algorithm this moves to:
      
      - Build a simplification mapping based on the callsite arguments to the
        function arguments.
      - Push the entry block onto a worklist of potentially-live basic blocks.
      - Pop the first block off of the *front* of the worklist (for
        breadth-first ordering) and walk its instructions using a custom
        InstVisitor.
      - For each instruction's operands, re-map them based on the
        simplification mappings available for the given callsite.
      - Compute any simplification possible of the instruction after
        re-mapping, and store that back int othe simplification mapping.
      - Compute any bonuses, costs, or other impacts of the instruction on the
        cost metric.
      - When the terminator is reached, replace any conditional value in the
        terminator with any simplifications from the mapping we have, and add
        any successors which are not proven to be dead from these
        simplifications to the worklist.
      - Pop the next block off of the front of the worklist, and repeat.
      - As soon as the cost of inlining exceeds the threshold for the
        callsite, stop analyzing the function in order to bound cost.
      
      The primary goal of this algorithm is to perfectly handle dead code
      paths. We do not want any code in trivially dead code paths to impact
      inlining decisions. The previous metric was *extremely* flawed here, and
      would always subtract the average cost of two successors of
      a conditional branch when it was proven to become an unconditional
      branch at the callsite. There was no handling of wildly different costs
      between the two successors, which would cause inlining when the path
      actually taken was too large, and no inlining when the path actually
      taken was trivially simple. There was also no handling of the code
      *path*, only the immediate successors. These problems vanish completely
      now. See the added regression tests for the shiny new features -- we
      skip recursive function calls, SROA-killing instructions, and high cost
      complex CFG structures when dead at the callsite being analyzed.
      
      Switching to this algorithm required refactoring the inline cost
      interface to accept the actual threshold rather than simply returning
      a single cost. The resulting interface is pretty bad, and I'm planning
      to do lots of interface cleanup after this patch.
      
      Several other refactorings fell out of this, but I've tried to minimize
      them for this patch. =/ There is still more cleanup that can be done
      here. Please point out anything that you see in review.
      
      I've worked really hard to try to mirror at least the spirit of all of
      the previous heuristics in the new model. It's not clear that they are
      all correct any more, but I wanted to minimize the change in this single
      patch, it's already a bit ridiculous. One heuristic that is *not* yet
      mirrored is to allow inlining of functions with a dynamic alloca *if*
      the caller has a dynamic alloca. I will add this back, but I think the
      most reasonable way requires changes to the inliner itself rather than
      just the cost metric, and so I've deferred this for a subsequent patch.
      The test case is XFAIL-ed until then.
      
      As mentioned in the review mail, this seems to make Clang run about 1%
      to 2% faster in -O0, but makes its binary size grow by just under 4%.
      I've looked into the 4% growth, and it can be fixed, but requires
      changes to other parts of the inliner.
      
      llvm-svn: 153812
      0539c071
  38. Mar 27, 2012
    • Chandler Carruth's avatar
      Make a seemingly tiny change to the inliner and fix the generated code · b9e35fbc
      Chandler Carruth authored
      size bloat. Unfortunately, I expect this to disable the majority of the
      benefit from r152737. I'm hopeful at least that it will fix PR12345. To
      explain this requires... quite a bit of backstory I'm afraid.
      
      TL;DR: The change in r152737 actually did The Wrong Thing for
      linkonce-odr functions. This change makes it do the right thing. The
      benefits we saw were simple luck, not any actual strategy. Benchmark
      numbers after a mini-blog-post so that I've written down my thoughts on
      why all of this works and doesn't work...
      
      To understand what's going on here, you have to understand how the
      "bottom-up" inliner actually works. There are two fundamental modes to
      the inliner:
      
      1) Standard fixed-cost bottom-up inlining. This is the mode we usually
         think about. It walks from the bottom of the CFG up to the top,
         looking at callsites, taking information about the callsite and the
         called function and computing th expected cost of inlining into that
         callsite. If the cost is under a fixed threshold, it inlines. It's
         a touch more complicated than that due to all the bonuses, weights,
         etc. Inlining the last callsite to an internal function gets higher
         weighth, etc. But essentially, this is the mode of operation.
      
      2) Deferred bottom-up inlining (a term I just made up). This is the
         interesting mode for this patch an r152737. Initially, this works
         just like mode #1, but once we have the cost of inlining into the
         callsite, we don't just compare it with a fixed threshold. First, we
         check something else. Let's give some names to the entities at this
         point, or we'll end up hopelessly confused. We're considering
         inlining a function 'A' into its callsite within a function 'B'. We
         want to check whether 'B' has any callers, and whether it might be
         inlined into those callers. If so, we also check whether inlining 'A'
         into 'B' would block any of the opportunities for inlining 'B' into
         its callers. We take the sum of the costs of inlining 'B' into its
         callers where that inlining would be blocked by inlining 'A' into
         'B', and if that cost is less than the cost of inlining 'A' into 'B',
         then we skip inlining 'A' into 'B'.
      
      Now, in order for #2 to make sense, we have to have some confidence that
      we will actually have the opportunity to inline 'B' into its callers
      when cheaper, *and* that we'll be able to revisit the decision and
      inline 'A' into 'B' if that ever becomes the correct tradeoff. This
      often isn't true for external functions -- we can see very few of their
      callers, and we won't be able to re-consider inlining 'A' into 'B' if
      'B' is external when we finally see more callers of 'B'. There are two
      cases where we believe this to be true for C/C++ code: functions local
      to a translation unit, and functions with an inline definition in every
      translation unit which uses them. These are represented as internal
      linkage and linkonce-odr (resp.) in LLVM. I enabled this logic for
      linkonce-odr in r152737.
      
      Unfortunately, when I did that, I also introduced a subtle bug. There
      was an implicit assumption that the last caller of the function within
      the TU was the last caller of the function in the program. We want to
      bonus the last caller of the function in the program by a huge amount
      for inlining because inlining that callsite has very little cost.
      Unfortunately, the last caller in the TU of a linkonce-odr function is
      *not* the last caller in the program, and so we don't want to apply this
      bonus. If we do, we can apply it to one callsite *per-TU*. Because of
      the way deferred inlining works, when it sees this bonus applied to one
      callsite in the TU for 'B', it decides that inlining 'B' is of the
      *utmost* importance just so we can get that final bonus. It then
      proceeds to essentially force deferred inlining regardless of the actual
      cost tradeoff.
      
      The result? PR12345: code bloat, code bloat, code bloat. Another result
      is getting *damn* lucky on a few benchmarks, and the over-inlining
      exposing critically important optimizations. I would very much like
      a list of benchmarks that regress after this change goes in, with
      bitcode before and after. This will help me greatly understand what
      opportunities the current cost analysis is missing.
      
      Initial benchmark numbers look very good. WebKit files that exhibited
      the worst of PR12345 went from growing to shrinking compared to Clang
      with r152737 reverted.
      
      - Bootstrapped Clang is 3% smaller with this change.
      - Bootstrapped Clang -O0 over a single-source-file of lib/Lex is 4%
        faster with this change.
      
      Please let me know about any other performance impact you see. Thanks to
      Nico for reporting and urging me to actually fix, Richard Smith, Duncan
      Sands, Manuel Klimek, and Benjamin Kramer for talking through the issues
      today.
      
      llvm-svn: 153506
      b9e35fbc
Loading