Skip to content
  1. Nov 23, 2016
  2. Nov 17, 2016
    • Dehao Chen's avatar
      Use profile info to adjust loop unroll threshold. · 41d72a86
      Dehao Chen authored
      Summary:
      For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold.
      For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling.
      
      Reviewers: davidxl, mzolotukhin
      
      Subscribers: sanjoy, mehdi_amini, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D26527
      
      llvm-svn: 287186
      41d72a86
  3. Nov 09, 2016
    • Evgeny Stupachenko's avatar
      Minor unroll pass refacoring. · c2698cd9
      Evgeny Stupachenko authored
      Summary:
      Unrolled Loop Size calculations moved to a function.
      Constant representing number of optimized instructions
       when "back edge" becomes "fall through" replaced with
       variable.
      Some comments added.
      
      Reviewers: mzolotukhin
      
      Differential Revision: http://reviews.llvm.org/D21719
      
      From: Evgeny Stupachenko <evstupac@gmail.com>
      llvm-svn: 286389
      c2698cd9
  4. Oct 27, 2016
  5. Oct 25, 2016
  6. Oct 21, 2016
    • John Brawn's avatar
      [LoopUnroll] Keep the loop test only on the first iteration of max-or-zero loops · 84b21835
      John Brawn authored
      When we have a loop with a known upper bound on the number of iterations, and
      furthermore know that either the number of iterations will be either exactly
      that upper bound or zero, then we can fully unroll up to that upper bound
      keeping only the first loop test to check for the zero iteration case.
      
      Most of the work here is in plumbing this 'max-or-zero' information from the
      part of scalar evolution where it's detected through to loop unrolling. I've
      also gone for the safe default of 'false' everywhere but howManyLessThans which
      could probably be improved.
      
      Differential Revision: https://reviews.llvm.org/D25682
      
      llvm-svn: 284818
      84b21835
  7. Oct 12, 2016
    • Haicheng Wu's avatar
      Reapply "[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop" · 1ef17e90
      Haicheng Wu authored
      Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049.
      
      The original summary:
      
      This patch tries to fully unroll loops having break statement like this
      
      for (int i = 0; i < 8; i++) {
          if (a[i] == value) {
              found = true;
              break;
          }
      }
      
      GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
      supports loops having exact constant trip counts.
      
      The upper bound of the trip count can be obtained from calling
      ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
      refactoring work in SCEV to prevent duplicating code.
      
      The feature of using the upper bound is enabled under the same circumstance
      when runtime unrolling is enabled since both are used to unroll loops without
      knowing the exact constant trip count.
      
      llvm-svn: 284053
      1ef17e90
    • Haicheng Wu's avatar
      Revert "[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop" · 45e4ef73
      Haicheng Wu authored
      This reverts commit r284044.
      
      llvm-svn: 284051
      45e4ef73
    • Haicheng Wu's avatar
      [LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop · 6cac34fd
      Haicheng Wu authored
      This patch tries to fully unroll loops having break statement like this
      
      for (int i = 0; i < 8; i++) {
          if (a[i] == value) {
              found = true;
              break;
          }
      }
      
      GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
      supports loops having exact constant trip counts.
      
      The upper bound of the trip count can be obtained from calling
      ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
      refactoring work in SCEV to prevent duplicating code.
      
      The feature of using the upper bound is enabled under the same circumstance
      when runtime unrolling is enabled since both are used to unroll loops without
      knowing the exact constant trip count.
      
      Differential Revision: https://reviews.llvm.org/D24790
      
      llvm-svn: 284044
      6cac34fd
  8. Sep 30, 2016
  9. Sep 28, 2016
    • Jonas Paulsson's avatar
      [SystemZ] Implementation of getUnrollingPreferences(). · 58c5a7f5
      Jonas Paulsson authored
      This commit enables more unrolling for SystemZ by implementing the
      SystemZTargetTransformInfo::getUnrollingPreferences() method.
      
      It has been found that it is better to only unroll moderately, so the
      DefaultUnrollRuntimeCount has been moved into UnrollingPreferences in order
      to set this to a lower value for SystemZ (4).
      
      Reviewers: Evgeny Stupachenko, Ulrich Weigand.
      https://reviews.llvm.org/D24451
      
      llvm-svn: 282570
      58c5a7f5
  10. Sep 07, 2016
  11. Aug 26, 2016
    • Adam Nemet's avatar
      [LoopUnroll] Use OptimizationRemarkEmitter directly not via the analysis pass · 4f155b6e
      Adam Nemet authored
      We can't mark ORE (a function pass) preserved as required by the loop
      passes because that is how we ensure that the required passes like
      LazyBFI are all available any time ORE is used.  See the new comments in
      the patch.
      
      Instead we use it directly just like the inliner does in D22694.
      
      As expected there is some additional overhead after removing the caching
      provided by analysis passes.  The worst case, I measured was
      LNT/CINT2006_ref/401.bzip2 which regresses by 12%.  As before, this only
      affects -Rpass-with-hotness and not default compilation.
      
      llvm-svn: 279829
      4f155b6e
  12. Aug 24, 2016
    • Michael Zolotukhin's avatar
      [LoopUnroll] By default disable unrolling when optimizing for size. · bd63d436
      Michael Zolotukhin authored
      Summary:
      In clang commit r268509 we started to invoke loop-unroll pass from the
      driver even under -Os. However, we happen to not initialize optsize
      thresholds properly, which si fixed with this change.
      
      r268509 led to some big compile time regressions, because we started to
      unroll some loops that we didn't unroll before. With this change I hope
      to recover most of the regressions. We still are slightly slower than
      before, because we do some checks here and there in loop-unrolling
      before we bail out, but at least the slowdown is not that huge now.
      
      Reviewers: hfinkel, chandlerc
      
      Subscribers: mzolotukhin, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D23388
      
      llvm-svn: 279585
      bd63d436
  13. Aug 18, 2016
  14. Aug 09, 2016
    • Sean Silva's avatar
      Consistently use LoopAnalysisManager · 0746f3bf
      Sean Silva authored
      One exception here is LoopInfo which must forward-declare it (because
      the typedef is in LoopPassManager.h which depends on LoopInfo).
      
      Also, some includes for LoopPassManager.h were needed since that file
      provides the typedef.
      
      Besides a general consistently benefit, the extra layer of indirection
      allows the mechanical part of https://reviews.llvm.org/D23256 that
      requires touching every transformation and analysis to be factored out
      cleanly.
      
      Thanks to David for the suggestion.
      
      llvm-svn: 278079
      0746f3bf
  15. Jul 29, 2016
    • Adam Nemet's avatar
      [LoopUnroll] Include hotness of region in opt remark · 12937c36
      Adam Nemet authored
      LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter
      is added to the common function analysis passes that loop passes
      depend on.
      
      The BFI and indirectly BPI used in this pass is computed lazily so no
      overhead should be observed unless -pass-remarks-with-hotness is used.
      
      This is how the patch affects the O3 pipeline:
      
               Dominator Tree Construction
               Natural Loop Information
               Canonicalize natural loops
               Loop-Closed SSA Form Pass
               Basic Alias Analysis (stateless AA impl)
               Function Alias Analysis Results
               Scalar Evolution Analysis
      +        Lazy Branch Probability Analysis
      +        Lazy Block Frequency Analysis
      +        Optimization Remark Emitter
               Loop Pass Manager
                 Rotate Loops
                 Loop Invariant Code Motion
                 Unswitch loops
               Simplify the CFG
               Dominator Tree Construction
               Basic Alias Analysis (stateless AA impl)
               Function Alias Analysis Results
               Combine redundant instructions
               Natural Loop Information
               Canonicalize natural loops
               Loop-Closed SSA Form Pass
               Scalar Evolution Analysis
      +        Lazy Branch Probability Analysis
      +        Lazy Block Frequency Analysis
      +        Optimization Remark Emitter
               Loop Pass Manager
                 Induction Variable Simplification
                 Recognize loop idioms
                 Delete dead loops
                 Unroll loops
      ...
      
      llvm-svn: 277203
      12937c36
  16. Jul 20, 2016
    • Sean Silva's avatar
      [PM] Port LoopUnroll. · e3c18a5a
      Sean Silva authored
      We just set PreserveLCSSA to always true since we don't have an
      analogous method `mustPreserveAnalysisID(LCSSA)`.
      
      Also port LoopInfo verifier pass to test LoopUnrollPass.
      
      llvm-svn: 276063
      e3c18a5a
  17. Jun 15, 2016
    • David Majnemer's avatar
      [LoopUnroll] Don't crash trying to unroll loop with EH pad exit · 4a697c31
      David Majnemer authored
      We do not support splitting cleanuppad or catchswitches.  This is
      problematic for passes which assume that a loop is in loop simplify
      form (the loop would have a dedicated exit block instead of sharing it).
      
      While it isn't great that we don't support this for cleanups, we still
      cannot make loop-simplify form an assertable precondition because
      indirectbr will also disable these sorts of CFG cleanups.
      
      This fixes PR28132.
      
      llvm-svn: 272739
      4a697c31
  18. Jun 08, 2016
    • Evgeny Stupachenko's avatar
      The patch set unroll disable pragma when unroll · 3e2f389a
      Evgeny Stupachenko authored
      with user specified count has been applied.
      
      Summary:
      Previously SetLoopAlreadyUnrolled() set the disable pragma only if
      there was some loop metadata.
      Now it set the pragma in all cases. This helps to prevent multiple
      unroll when -unroll-count=N is given.
      
      Reviewers: mzolotukhin
      
      Differential Revision: http://reviews.llvm.org/D20765
      
      From: Evgeny Stupachenko <evstupac@gmail.com>
      llvm-svn: 272195
      3e2f389a
  19. Jun 03, 2016
  20. May 28, 2016
    • Evgeny Stupachenko's avatar
      The patch fixes r271071 · b787522d
      Evgeny Stupachenko authored
      Summary:
      unused variables in Release mode:
        BasicBlock *Header
        unsigned OrigCount
      put under DEBUG
      
      From: Evgeny Stupachenko <evstupac@gmail.com>
      llvm-svn: 271076
      b787522d
    • Evgeny Stupachenko's avatar
      The patch refactors unroll pass. · ea2aef4a
      Evgeny Stupachenko authored
      Summary:
      Unroll factor (Count) calculations moved to a new function.
      Early exits on pragma and "-unroll-count" defined factor added.
      New type of unrolling "Force" introduced (previously used implicitly).
      New unroll preference "AllowRemainder" introduced and set "true" by default.
      (should be set to false for architectures that suffers from it).
      
      Reviewers: hfinkel, mzolotukhin, zzheng
      
      Differential Revision: http://reviews.llvm.org/D19553
      
      From: Evgeny Stupachenko <evstupac@gmail.com>
      llvm-svn: 271071
      ea2aef4a
  21. May 27, 2016
  22. May 26, 2016
  23. May 25, 2016
  24. May 24, 2016
  25. May 23, 2016
    • Michael Zolotukhin's avatar
      [LoopUnroll] Enable advanced unrolling analysis by default. · be080fc5
      Michael Zolotukhin authored
      Summary:
      This patch turns on LoopUnrollAnalyzer by default. To mitigate compile
      time regressions, I chose very conservative thresholds for now. Later we
      can make them more aggressive, but it might require being smarter in
      which loops we're optimizing. E.g. currently the biggest issue is that
      with more agressive thresholds we unroll many cold loops, which
      increases compile time for no performance benefit (performance of those
      loops is improved, but it doesn't matter since they are cold).
      
      Test results for compile time(using 4 samples to reduce noise):
      ```
      MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes 5.19%
      SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect  4.19%
      MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow  3.39%
      MultiSource/Applications/JM/lencod/lencod 1.47%
      MultiSource/Benchmarks/Fhourstones-3_1/fhourstones3_1 -6.06%
      ```
      
      I didn't see any performance changes in the testsuite, but it improves
      some internal tests.
      
      Reviewers: hfinkel, chandlerc
      
      Subscribers: llvm-commits, mzolotukhin
      
      Differential Revision: http://reviews.llvm.org/D20482
      
      llvm-svn: 270478
      be080fc5
  26. May 18, 2016
  27. May 13, 2016
    • Michael Zolotukhin's avatar
      Revert "Revert "[Unroll] Implement a conservative and monotonically increasing... · 963a6d9c
      Michael Zolotukhin authored
      Revert "Revert "[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the...""
      
      This reverts commit r269395.
      
      Try to reapply with a fix from chapuni.
      
      llvm-svn: 269486
      963a6d9c
    • Michael Zolotukhin's avatar
      Revert "[Unroll] Implement a conservative and monotonically increasing cost... · 9be3b8b9
      Michael Zolotukhin authored
      Revert "[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the..."
      
      This reverts commit r269388.
      
      It caused some bots to fail, I'm reverting it until I investigate the
      issue.
      
      llvm-svn: 269395
      9be3b8b9
    • Michael Zolotukhin's avatar
      [Unroll] Implement a conservative and monotonically increasing cost tracking... · b7b80529
      Michael Zolotukhin authored
      [Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the...
      
      Summary:
      ...loop after the last iteration.
      
      This is really hard to do correctly. The core problem is that we need to
      model liveness through the induction PHIs from iteration to iteration in
      order to get the correct results, and we need to correctly de-duplicate
      the common subgraphs of instructions feeding some subset of the
      induction PHIs. All of this can be driven either from a side effect at
      some iteration or from the loop values used after the loop finishes.
      
      This patch implements this by storing the forward-propagating analysis
      of each instruction in a cache to recall whether it was free and whether
      it has become live and thus counted toward the total unroll cost. Then,
      at each sink for a value in the loop, we recursively walk back through
      every value that feeds the sink, including looping back through the
      iterations as needed, until we have marked the entire input graph as
      live. Because we cache this, we never visit instructions more than twice
      -- once when we analyze them and put them into the cache, and once when
      we count their cost towards the unrolled loop. Also, because the cache
      is only two bits and because we are dealing with relatively small
      iteration counts, we can store all of this very densely in memory to
      avoid this from becoming an excessively slow analysis.
      
      The code here is still pretty gross. I would appreciate suggestions
      about better ways to factor or split this up, I've stared too long at
      the algorithmic side to really have a good sense of what the design
      should probably look at.
      
      Also, it might seem like we should do all of this bottom-up, but I think
      that is a red herring. Specifically, the simplification power is *much*
      greater working top-down. We can forward propagate very effectively,
      even across strange and interesting recurrances around the backedge.
      Because we use data to propagate, this doesn't cause a state space
      explosion. Doing this level of constant folding, etc, would be very
      expensive to do bottom-up because it wouldn't be until the last moment
      that you could collapse everything. The current solution is essentially
      a top-down simplification with a bottom-up cost accounting which seems
      to get the best of both worlds. It makes the simplification incremental
      and powerful while leaving everything dead until we *know* it is needed.
      
      Finally, a core property of this approach is its *monotonicity*. At all
      times, the current UnrolledCost is a conservatively low estimate. This
      ensures that we will never early-exit from the analysis due to exceeding
      a threshold when if we had continued, the cost would have gone back
      below the threshold. These kinds of bugs can cause incredibly hard to
      track down random changes to behavior.
      
      We could use a techinque similar (but much simpler) within the inliner
      as well to avoid considering speculated code in the inline cost.
      
      Reviewers: chandlerc
      
      Subscribers: sanjoy, mzolotukhin, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D11758
      
      llvm-svn: 269388
      b7b80529
  28. May 10, 2016
    • Hans Wennborg's avatar
      Loop unroller: set thresholds for optsize and minsize functions to zero · 719b26ba
      Hans Wennborg authored
      Before r268509, Clang would disable the loop unroll pass when optimizing
      for size. That commit enabled it to be able to support unroll pragmas
      in -Os builds. However, this regressed binary size in one of Chromium's
      DLLs with ~100 KB.
      
      This restores the original behaviour of no unrolling at -Os, but doing it
      in LLVM instead of Clang makes more sense, and also allows the pragmas to
      keep working.
      
      Differential revision: http://reviews.llvm.org/D20115
      
      llvm-svn: 269124
      719b26ba
  29. May 05, 2016
  30. Apr 23, 2016
  31. Apr 22, 2016
  32. Apr 21, 2016
    • Andrew Kaylor's avatar
      Initial implementation of optimization bisect support. · f0f27929
      Andrew Kaylor authored
      This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.
      
      The bisection is enabled using a new command line option (-opt-bisect-limit).  Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit.  A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.
      
      The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check.  Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute.  A new function call has been added for module and SCC passes that behaves in a similar way.
      
      Differential Revision: http://reviews.llvm.org/D19172
      
      llvm-svn: 267022
      f0f27929
Loading