Skip to content
  1. Mar 14, 2012
    • Chandler Carruth's avatar
      Change where we enable the heuristic that delays inlining into functions · 30b8416d
      Chandler Carruth authored
      which are small enough to themselves be inlined. Delaying in this manner
      can be harmful if the function is inelligible for inlining in some (or
      many) contexts as it pessimizes the code of the function itself in the
      event that inlining does not eventually happen.
      
      Previously the check was written to only do this delaying of inlining
      for static functions in the hope that they could be entirely deleted and
      in the knowledge that all callers of static functions will have the
      opportunity to inline if it is in fact profitable. However, with C++ we
      get two other important sources of functions where the definition is
      always available for inlining: inline functions and templated functions.
      This patch generalizes the inliner to allow linkonce-ODR (the linkage
      such C++ routines receive) to also qualify for this delay-based
      inlining.
      
      Benchmarking across a range of large real-world applications shows
      roughly 2% size increase across the board, but an average speedup of
      about 0.5%. Some benhcmarks improved over 2%, and the 'clang' binary
      itself (when bootstrapped with this feature) shows a 1% -O0 performance
      improvement when run over all Sema, Lex, and Parse source code smashed
      into a single file. A clean re-build of Clang+LLVM with a bootstrapped
      Clang shows approximately 2% improvement, but that measurement is often
      noisy.
      
      llvm-svn: 152737
      30b8416d
  2. Mar 12, 2012
    • Chandler Carruth's avatar
      When inlining a function and adding its inner call sites to the · 595fda84
      Chandler Carruth authored
      candidate set for subsequent inlining, try to simplify the arguments to
      the inner call site now that inlining has been performed.
      
      The goal here is to propagate and fold constants through deeply nested
      call chains. Without doing this, we loose the inliner bonus that should
      be applied because the arguments don't match the exact pattern the cost
      estimator uses.
      
      Reviewed on IRC by Benjamin Kramer.
      
      llvm-svn: 152556
      595fda84
  3. Feb 25, 2012
  4. Oct 20, 2011
    • Eli Friedman's avatar
      Refactor code from inlining and globalopt that checks whether a function... · 1923a330
      Eli Friedman authored
      Refactor code from inlining and globalopt that checks whether a function definition is unused, and enhance it so it can tell that functions which are only used by a blockaddress are in fact dead.  This probably doesn't happen much on most code, but the Linux kernel's _THIS_IP_ can trigger this issue with blockaddress.  (GlobalDCE can also handle the given tescase, but we only run that at -O3.)  Found while looking at PR11180.
      
      llvm-svn: 142572
      1923a330
  5. Jul 18, 2011
  6. Apr 23, 2011
  7. Jan 04, 2011
    • Dale Johannesen's avatar
      Improve the accuracy of the inlining heuristic looking for the · a71d2cc8
      Dale Johannesen authored
      case where a static caller is itself inlined everywhere else, and
      thus may go away if it doesn't get too big due to inlining other
      things into it.  If there are references to the caller other than
      calls, it will not be removed; account for this.
      This results in same-day completion of the case in PR8853.
      
      llvm-svn: 122821
      a71d2cc8
  8. Dec 06, 2010
    • Chris Lattner's avatar
      Fix PR8735, a really terrible problem in the inliner's "alloca merging" · fb212de0
      Chris Lattner authored
      optimization.
      
      Consider:
      static void foo() {
        A = alloca
        ...
      }
      
      static void bar() {
        B = alloca
        ...
        call foo();
      }
      
      void main() {
        bar()
      }
      
      The inliner proceeds bottom up, but lets pretend it decides not to inline foo
      into bar.  When it gets to main, it inlines bar into main(), and says "hey, I
      just inlined an alloca "B" into main, lets remember that.  Then it keeps going
      and finds that it now contains a call to foo.  It decides to inline foo into
      main, and says "hey, foo has an alloca A, and I have an alloca B from another
      inlined call site, lets reuse it".  The problem with this of course, is that 
      the lifetime of A and B are nested, not disjoint.
      
      Unfortunately I can't create a reasonable testcase for this: the one in the
      PR is both huge and extremely sensitive, because you minor tweaks end up
      causing foo to get inlined into bar too early.  We already have tests for the
      basic alloca merging optimization and this does not break them.
      
      llvm-svn: 120995
      fb212de0
    • Chris Lattner's avatar
      improve -debug output and comments a little. · 5b6a865f
      Chris Lattner authored
      llvm-svn: 120993
      5b6a865f
  9. Nov 03, 2010
  10. Aug 06, 2010
  11. Jul 29, 2010
  12. Jul 13, 2010
  13. May 31, 2010
  14. May 01, 2010
  15. Apr 25, 2010
  16. Apr 23, 2010
  17. Apr 20, 2010
  18. Apr 17, 2010
  19. Mar 10, 2010
    • Jakob Stoklund Olesen's avatar
      Try to keep the cached inliner costs around for a bit longer for big functions. · b495cad7
      Jakob Stoklund Olesen authored
      The Caller cost info would be reset everytime a callee was inlined. If the
      caller has lots of calls and there is some mutual recursion going on, the
      caller cost info could be calculated many times.
      
      This patch reduces inliner runtime from 240s to 0.5s for a function with 20000
      small function calls.
      
      This is a more conservative version of r98089 that doesn't break the clang
      test CodeGenCXX/temp-order.cpp. That test relies on rather extreme inlining
      for constant folding.
      
      llvm-svn: 98099
      b495cad7
  20. Mar 09, 2010
  21. Feb 13, 2010
    • Jakob Stoklund Olesen's avatar
      Enable the inlinehint attribute in the Inliner. · 492b8b42
      Jakob Stoklund Olesen authored
      Functions explicitly marked inline will get an inlining threshold slightly
      more aggressive than the default for -O3. This means than -O3 builds are
      mostly unaffected while -Os builds will be a bit bigger and faster.
      
      The difference depends entirely on how many 'inline's are sprinkled on the
      source.
      
      In the CINT2006 suite, only these tests are significantly affected under -Os:
      
                     Size   Time
      471.omnetpp   +1.63% -1.85%
      473.astar     +4.01% -6.02%
      483.xalancbmk +4.60%  0.00%
      
      Note that 483.xalancbmk runs too quickly to give useful timing results.
      
      llvm-svn: 96066
      492b8b42
  22. Feb 06, 2010
    • Jakob Stoklund Olesen's avatar
      Reintroduce the InlineHint function attribute. · 74bb06c0
      Jakob Stoklund Olesen authored
      This time it's for real! I am going to hook this up in the frontends as well.
      
      The inliner has some experimental heuristics for dealing with the inline hint.
      When given a -respect-inlinehint option, functions marked with the inline
      keyword are given a threshold just above the default for -O3.
      
      We need some experiments to determine if that is the right thing to do.
      
      llvm-svn: 95466
      74bb06c0
  23. Feb 04, 2010
    • Jakob Stoklund Olesen's avatar
      Increase inliner thresholds by 25. · 113fb54b
      Jakob Stoklund Olesen authored
      This makes the inliner about as agressive as it was before my changes to the
      inliner cost calculations. These levels give the same performance and slightly
      smaller code than before.
      
      llvm-svn: 95320
      113fb54b
  24. Jan 20, 2010
  25. Jan 05, 2010
  26. Nov 12, 2009
    • Chris Lattner's avatar
      use isInstructionTriviallyDead, as pointed out by Duncan · 5c89f4b4
      Chris Lattner authored
      llvm-svn: 87035
      5c89f4b4
    • Chris Lattner's avatar
      implement a nice little efficiency hack in the inliner. Since we're now · eb9acbfb
      Chris Lattner authored
      running IPSCCP early, and we run functionattrs interlaced with the inliner,
      we often (particularly for small or noop functions) completely propagate
      all of the information about a call to its call site in IPSSCP (making a call
      dead) and functionattrs is smart enough to realize that the function is
      readonly (because it is interlaced with inliner).
      
      To improve compile time and make the inliner threshold more accurate, realize
      that we don't have to inline dead readonly function calls.  Instead, just 
      delete the call.  This happens all the time for C++ codes, here are some
      counters from opt/llvm-ld counting the number of times calls were deleted vs
      inlined on various apps:
      
      Tramp3d opt:
        5033 inline                - Number of call sites deleted, not inlined
       24596 inline                - Number of functions inlined
      llvm-ld:
        667 inline           - Number of functions deleted because all callers found
        699 inline           - Number of functions inlined
      
      483.xalancbmk opt:
        8096 inline                - Number of call sites deleted, not inlined
       62528 inline                - Number of functions inlined
      llvm-ld:
         217 inline           - Number of allocas merged together
        2158 inline           - Number of functions inlined
      
      471.omnetpp:
        331 inline                - Number of call sites deleted, not inlined
       8981 inline                - Number of functions inlined
      llvm-ld:
        171 inline           - Number of functions deleted because all callers found
        629 inline           - Number of functions inlined
      
      
      Deleting a call is much faster than inlining it, and is insensitive to the
      size of the callee. :)
      
      llvm-svn: 86975
      eb9acbfb
  27. Oct 13, 2009
Loading