- Nov 23, 2016
-
-
Haicheng Wu authored
Just to save some compilation time. Differential Revision: https://reviews.llvm.org/D26784 llvm-svn: 287800
-
- Nov 17, 2016
-
-
Dehao Chen authored
Summary: For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold. For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling. Reviewers: davidxl, mzolotukhin Subscribers: sanjoy, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D26527 llvm-svn: 287186
-
- Nov 09, 2016
-
-
Evgeny Stupachenko authored
Summary: Unrolled Loop Size calculations moved to a function. Constant representing number of optimized instructions when "back edge" becomes "fall through" replaced with variable. Some comments added. Reviewers: mzolotukhin Differential Revision: http://reviews.llvm.org/D21719 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 286389
-
- Oct 27, 2016
-
-
Haicheng Wu authored
Differential Revision: https://reviews.llvm.org/D23891 llvm-svn: 285330
-
- Oct 25, 2016
-
-
Michael Kuperstein authored
llvm-svn: 285092
-
- Oct 21, 2016
-
-
John Brawn authored
When we have a loop with a known upper bound on the number of iterations, and furthermore know that either the number of iterations will be either exactly that upper bound or zero, then we can fully unroll up to that upper bound keeping only the first loop test to check for the zero iteration case. Most of the work here is in plumbing this 'max-or-zero' information from the part of scalar evolution where it's detected through to loop unrolling. I've also gone for the safe default of 'false' everywhere but howManyLessThans which could probably be improved. Differential Revision: https://reviews.llvm.org/D25682 llvm-svn: 284818
-
- Oct 12, 2016
-
-
Haicheng Wu authored
Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049. The original summary: This patch tries to fully unroll loops having break statement like this for (int i = 0; i < 8; i++) { if (a[i] == value) { found = true; break; } } GCC can fully unroll such loops, but currently LLVM cannot because LLVM only supports loops having exact constant trip counts. The upper bound of the trip count can be obtained from calling ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the refactoring work in SCEV to prevent duplicating code. The feature of using the upper bound is enabled under the same circumstance when runtime unrolling is enabled since both are used to unroll loops without knowing the exact constant trip count. llvm-svn: 284053
-
Haicheng Wu authored
This reverts commit r284044. llvm-svn: 284051
-
Haicheng Wu authored
This patch tries to fully unroll loops having break statement like this for (int i = 0; i < 8; i++) { if (a[i] == value) { found = true; break; } } GCC can fully unroll such loops, but currently LLVM cannot because LLVM only supports loops having exact constant trip counts. The upper bound of the trip count can be obtained from calling ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the refactoring work in SCEV to prevent duplicating code. The feature of using the upper bound is enabled under the same circumstance when runtime unrolling is enabled since both are used to unroll loops without knowing the exact constant trip count. Differential Revision: https://reviews.llvm.org/D24790 llvm-svn: 284044
-
- Sep 30, 2016
-
-
Dehao Chen authored
Summary: Debug info should *not* affect optimization decisions. This patch updates loop unroller cost model to make it not affected by debug info. Reviewers: davidxl, mzolotukhin Subscribers: haicheng, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D25098 llvm-svn: 282894
-
Adam Nemet authored
llvm-svn: 282834
-
- Sep 28, 2016
-
-
Jonas Paulsson authored
This commit enables more unrolling for SystemZ by implementing the SystemZTargetTransformInfo::getUnrollingPreferences() method. It has been found that it is better to only unroll moderately, so the DefaultUnrollRuntimeCount has been moved into UnrollingPreferences in order to set this to a lower value for SystemZ (4). Reviewers: Evgeny Stupachenko, Ulrich Weigand. https://reviews.llvm.org/D24451 llvm-svn: 282570
-
- Sep 07, 2016
-
-
Haicheng Wu authored
Differential Revision: https://reviews.llvm.org/D24299 llvm-svn: 280865
-
- Aug 26, 2016
-
-
Adam Nemet authored
We can't mark ORE (a function pass) preserved as required by the loop passes because that is how we ensure that the required passes like LazyBFI are all available any time ORE is used. See the new comments in the patch. Instead we use it directly just like the inliner does in D22694. As expected there is some additional overhead after removing the caching provided by analysis passes. The worst case, I measured was LNT/CINT2006_ref/401.bzip2 which regresses by 12%. As before, this only affects -Rpass-with-hotness and not default compilation. llvm-svn: 279829
-
- Aug 24, 2016
-
-
Michael Zolotukhin authored
Summary: In clang commit r268509 we started to invoke loop-unroll pass from the driver even under -Os. However, we happen to not initialize optsize thresholds properly, which si fixed with this change. r268509 led to some big compile time regressions, because we started to unroll some loops that we didn't unroll before. With this change I hope to recover most of the regressions. We still are slightly slower than before, because we do some checks here and there in loop-unrolling before we bail out, but at least the slowdown is not that huge now. Reviewers: hfinkel, chandlerc Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D23388 llvm-svn: 279585
-
- Aug 18, 2016
-
-
Haicheng Wu authored
Move the check of CallInst earlier to skip expensive recursive operations. Differential Revision: https://reviews.llvm.org/D23611 llvm-svn: 278998
-
- Aug 09, 2016
-
-
Sean Silva authored
One exception here is LoopInfo which must forward-declare it (because the typedef is in LoopPassManager.h which depends on LoopInfo). Also, some includes for LoopPassManager.h were needed since that file provides the typedef. Besides a general consistently benefit, the extra layer of indirection allows the mechanical part of https://reviews.llvm.org/D23256 that requires touching every transformation and analysis to be factored out cleanly. Thanks to David for the suggestion. llvm-svn: 278079
-
- Jul 29, 2016
-
-
Adam Nemet authored
LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter is added to the common function analysis passes that loop passes depend on. The BFI and indirectly BPI used in this pass is computed lazily so no overhead should be observed unless -pass-remarks-with-hotness is used. This is how the patch affects the O3 pipeline: Dominator Tree Construction Natural Loop Information Canonicalize natural loops Loop-Closed SSA Form Pass Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Scalar Evolution Analysis + Lazy Branch Probability Analysis + Lazy Block Frequency Analysis + Optimization Remark Emitter Loop Pass Manager Rotate Loops Loop Invariant Code Motion Unswitch loops Simplify the CFG Dominator Tree Construction Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Combine redundant instructions Natural Loop Information Canonicalize natural loops Loop-Closed SSA Form Pass Scalar Evolution Analysis + Lazy Branch Probability Analysis + Lazy Block Frequency Analysis + Optimization Remark Emitter Loop Pass Manager Induction Variable Simplification Recognize loop idioms Delete dead loops Unroll loops ... llvm-svn: 277203
-
- Jul 20, 2016
-
-
Sean Silva authored
We just set PreserveLCSSA to always true since we don't have an analogous method `mustPreserveAnalysisID(LCSSA)`. Also port LoopInfo verifier pass to test LoopUnrollPass. llvm-svn: 276063
-
- Jun 15, 2016
-
-
David Majnemer authored
We do not support splitting cleanuppad or catchswitches. This is problematic for passes which assume that a loop is in loop simplify form (the loop would have a dedicated exit block instead of sharing it). While it isn't great that we don't support this for cleanups, we still cannot make loop-simplify form an assertable precondition because indirectbr will also disable these sorts of CFG cleanups. This fixes PR28132. llvm-svn: 272739
-
- Jun 08, 2016
-
-
Evgeny Stupachenko authored
with user specified count has been applied. Summary: Previously SetLoopAlreadyUnrolled() set the disable pragma only if there was some loop metadata. Now it set the pragma in all cases. This helps to prevent multiple unroll when -unroll-count=N is given. Reviewers: mzolotukhin Differential Revision: http://reviews.llvm.org/D20765 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 272195
-
- Jun 03, 2016
-
-
Michael Zolotukhin authored
In r270478, where I enabled the new heuristic I posted testing results, which I got when explicitly passed the thresholds values via CL options. However, setting the CL options init-values is not enough to change the default values of thresholds, so I'm changing them in another place now. llvm-svn: 271615
-
- May 28, 2016
-
-
Evgeny Stupachenko authored
Summary: unused variables in Release mode: BasicBlock *Header unsigned OrigCount put under DEBUG From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 271076
-
Evgeny Stupachenko authored
Summary: Unroll factor (Count) calculations moved to a new function. Early exits on pragma and "-unroll-count" defined factor added. New type of unrolling "Force" introduced (previously used implicitly). New unroll preference "AllowRemainder" introduced and set "true" by default. (should be set to false for architectures that suffers from it). Reviewers: hfinkel, mzolotukhin, zzheng Differential Revision: http://reviews.llvm.org/D19553 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 271071
-
- May 27, 2016
-
-
Benjamin Kramer authored
No functionality change intended, maybe a tiny performance improvement. llvm-svn: 270997
-
- May 26, 2016
-
-
Michael Zolotukhin authored
Condition might be simplified to a Constant, but it doesn't have to be ConstantInt, so we should dyn_cast, instead of cast. This fixes PR27886. llvm-svn: 270924
-
- May 25, 2016
-
-
Michael Zolotukhin authored
This reverts commit r270577. llvm-svn: 270630
-
- May 24, 2016
-
-
Hans Wennborg authored
Chromium builds are still hitting the assert in PR27874. llvm-svn: 270577
-
Michael Zolotukhin authored
This reverts commit r270512 and reapplies r270478. Originally it caused PR27847, but it was fixed in r270517. llvm-svn: 270518
-
Hans Wennborg authored
This caused PR27847. llvm-svn: 270512
-
- May 23, 2016
-
-
Michael Zolotukhin authored
Summary: This patch turns on LoopUnrollAnalyzer by default. To mitigate compile time regressions, I chose very conservative thresholds for now. Later we can make them more aggressive, but it might require being smarter in which loops we're optimizing. E.g. currently the biggest issue is that with more agressive thresholds we unroll many cold loops, which increases compile time for no performance benefit (performance of those loops is improved, but it doesn't matter since they are cold). Test results for compile time(using 4 samples to reduce noise): ``` MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes 5.19% SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect 4.19% MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow 3.39% MultiSource/Applications/JM/lencod/lencod 1.47% MultiSource/Benchmarks/Fhourstones-3_1/fhourstones3_1 -6.06% ``` I didn't see any performance changes in the testsuite, but it improves some internal tests. Reviewers: hfinkel, chandlerc Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D20482 llvm-svn: 270478
-
- May 18, 2016
-
-
Michael Zolotukhin authored
[LoopUnrollAnalyzer] Take into account cost of instructions controlling branches, along with their operands. Previously, we didn't add their and their operands cost, which could've resulted in unrolling loops for no actual benefit. llvm-svn: 269985
-
- May 13, 2016
-
-
Michael Zolotukhin authored
Revert "Revert "[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the..."" This reverts commit r269395. Try to reapply with a fix from chapuni. llvm-svn: 269486
-
Michael Zolotukhin authored
Revert "[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the..." This reverts commit r269388. It caused some bots to fail, I'm reverting it until I investigate the issue. llvm-svn: 269395
-
Michael Zolotukhin authored
[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the... Summary: ...loop after the last iteration. This is really hard to do correctly. The core problem is that we need to model liveness through the induction PHIs from iteration to iteration in order to get the correct results, and we need to correctly de-duplicate the common subgraphs of instructions feeding some subset of the induction PHIs. All of this can be driven either from a side effect at some iteration or from the loop values used after the loop finishes. This patch implements this by storing the forward-propagating analysis of each instruction in a cache to recall whether it was free and whether it has become live and thus counted toward the total unroll cost. Then, at each sink for a value in the loop, we recursively walk back through every value that feeds the sink, including looping back through the iterations as needed, until we have marked the entire input graph as live. Because we cache this, we never visit instructions more than twice -- once when we analyze them and put them into the cache, and once when we count their cost towards the unrolled loop. Also, because the cache is only two bits and because we are dealing with relatively small iteration counts, we can store all of this very densely in memory to avoid this from becoming an excessively slow analysis. The code here is still pretty gross. I would appreciate suggestions about better ways to factor or split this up, I've stared too long at the algorithmic side to really have a good sense of what the design should probably look at. Also, it might seem like we should do all of this bottom-up, but I think that is a red herring. Specifically, the simplification power is *much* greater working top-down. We can forward propagate very effectively, even across strange and interesting recurrances around the backedge. Because we use data to propagate, this doesn't cause a state space explosion. Doing this level of constant folding, etc, would be very expensive to do bottom-up because it wouldn't be until the last moment that you could collapse everything. The current solution is essentially a top-down simplification with a bottom-up cost accounting which seems to get the best of both worlds. It makes the simplification incremental and powerful while leaving everything dead until we *know* it is needed. Finally, a core property of this approach is its *monotonicity*. At all times, the current UnrolledCost is a conservatively low estimate. This ensures that we will never early-exit from the analysis due to exceeding a threshold when if we had continued, the cost would have gone back below the threshold. These kinds of bugs can cause incredibly hard to track down random changes to behavior. We could use a techinque similar (but much simpler) within the inliner as well to avoid considering speculated code in the inline cost. Reviewers: chandlerc Subscribers: sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D11758 llvm-svn: 269388
-
- May 10, 2016
-
-
Hans Wennborg authored
Before r268509, Clang would disable the loop unroll pass when optimizing for size. That commit enabled it to be able to support unroll pragmas in -Os builds. However, this regressed binary size in one of Chromium's DLLs with ~100 KB. This restores the original behaviour of no unrolling at -Os, but doing it in LLVM instead of Clang makes more sense, and also allows the pragmas to keep working. Differential revision: http://reviews.llvm.org/D20115 llvm-svn: 269124
-
- May 05, 2016
-
-
Dehao Chen authored
llvm-svn: 268583
-
- Apr 23, 2016
-
-
Andrew Kaylor authored
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling). Differential Revision: http://reviews.llvm.org/D19172 llvm-svn: 267231
-
- Apr 22, 2016
-
-
Vedant Kumar authored
This reverts commit r267022, due to an ASan failure: http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/1549 llvm-svn: 267115
-
- Apr 21, 2016
-
-
Andrew Kaylor authored
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations. The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used. The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way. Differential Revision: http://reviews.llvm.org/D19172 llvm-svn: 267022
-