Commits · 731b04ca43dd43df65c02e1beeb386d25cf32cf7 · Roger Ferrer / llvm-epi

Nov 23, 2016

[LoopUnroll] Move code to exit early. NFC. · 731b04ca

Haicheng Wu authored Nov 23, 2016

Just to save some compilation time.

Differential Revision: https://reviews.llvm.org/D26784

llvm-svn: 287800

731b04ca

Nov 17, 2016

Use profile info to adjust loop unroll threshold. · 41d72a86

Dehao Chen authored Nov 17, 2016

Summary:
For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold.
For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling.

Reviewers: davidxl, mzolotukhin

Subscribers: sanjoy, mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D26527

llvm-svn: 287186

41d72a86

Nov 09, 2016

Minor unroll pass refacoring. · c2698cd9

Evgeny Stupachenko authored Nov 09, 2016

Summary:
Unrolled Loop Size calculations moved to a function.
Constant representing number of optimized instructions
 when "back edge" becomes "fall through" replaced with
 variable.
Some comments added.

Reviewers: mzolotukhin

Differential Revision: http://reviews.llvm.org/D21719

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 286389

c2698cd9

Oct 27, 2016
- [LoopUnroll] Check partial unrolling is enabled before initialization. NFC. · 430b3e48
  Haicheng Wu authored Oct 27, 2016
```
Differential Revision: https://reviews.llvm.org/D23891

llvm-svn: 285330
```
  430b3e48
Oct 25, 2016
- Fix 80-char violations. NFC. · cffedc4a
  Michael Kuperstein authored Oct 25, 2016
```
llvm-svn: 285092
```
  cffedc4a
Oct 21, 2016

[LoopUnroll] Keep the loop test only on the first iteration of max-or-zero loops · 84b21835

John Brawn authored Oct 21, 2016

When we have a loop with a known upper bound on the number of iterations, and
furthermore know that either the number of iterations will be either exactly
that upper bound or zero, then we can fully unroll up to that upper bound
keeping only the first loop test to check for the zero iteration case.

Most of the work here is in plumbing this 'max-or-zero' information from the
part of scalar evolution where it's detected through to loop unrolling. I've
also gone for the safe default of 'false' everywhere but howManyLessThans which
could probably be improved.

Differential Revision: https://reviews.llvm.org/D25682

llvm-svn: 284818

84b21835

Oct 12, 2016

Reapply "[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop" · 1ef17e90

Haicheng Wu authored Oct 12, 2016

Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049.

The original summary:

This patch tries to fully unroll loops having break statement like this

for (int i = 0; i < 8; i++) {
    if (a[i] == value) {
        found = true;
        break;
    }
}

GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.

The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.

The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.

llvm-svn: 284053

1ef17e90

Revert "[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop" · 45e4ef73
Haicheng Wu authored Oct 12, 2016
```
This reverts commit r284044.

llvm-svn: 284051
```
45e4ef73

[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop · 6cac34fd

Haicheng Wu authored Oct 12, 2016

This patch tries to fully unroll loops having break statement like this

for (int i = 0; i < 8; i++) {
    if (a[i] == value) {
        found = true;
        break;
    }
}

GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.

The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.

The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.

Differential Revision: https://reviews.llvm.org/D24790

llvm-svn: 284044

6cac34fd

Sep 30, 2016

Update loop unroller cost model to make sure debug info does not affect optimization decisions. · 977853b7

Dehao Chen authored Sep 30, 2016

Summary: Debug info should *not* affect optimization decisions. This patch updates loop unroller cost model to make it not affected by debug info.

Reviewers: davidxl, mzolotukhin

Subscribers: haicheng, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D25098

llvm-svn: 282894

977853b7

[LoopUnroll] Port to the new streaming interface for opt remarks. · f57cc62a
Adam Nemet authored Sep 30, 2016
```
llvm-svn: 282834
```
f57cc62a

Sep 28, 2016

[SystemZ] Implementation of getUnrollingPreferences(). · 58c5a7f5

Jonas Paulsson authored Sep 28, 2016

This commit enables more unrolling for SystemZ by implementing the
SystemZTargetTransformInfo::getUnrollingPreferences() method.

It has been found that it is better to only unroll moderately, so the
DefaultUnrollRuntimeCount has been moved into UnrollingPreferences in order
to set this to a lower value for SystemZ (4).

Reviewers: Evgeny Stupachenko, Ulrich Weigand.
https://reviews.llvm.org/D24451

llvm-svn: 282570

58c5a7f5

Sep 07, 2016
- [LoopUnroll] Correct a debug message. NFC. · 109f4f35
  Haicheng Wu authored Sep 07, 2016
```
Differential Revision: https://reviews.llvm.org/D24299

llvm-svn: 280865
```
  109f4f35
Aug 26, 2016

[LoopUnroll] Use OptimizationRemarkEmitter directly not via the analysis pass · 4f155b6e

Adam Nemet authored Aug 26, 2016

We can't mark ORE (a function pass) preserved as required by the loop
passes because that is how we ensure that the required passes like
LazyBFI are all available any time ORE is used.  See the new comments in
the patch.

Instead we use it directly just like the inliner does in D22694.

As expected there is some additional overhead after removing the caching
provided by analysis passes.  The worst case, I measured was
LNT/CINT2006_ref/401.bzip2 which regresses by 12%.  As before, this only
affects -Rpass-with-hotness and not default compilation.

llvm-svn: 279829

4f155b6e

Aug 24, 2016

[LoopUnroll] By default disable unrolling when optimizing for size. · bd63d436

Michael Zolotukhin authored Aug 23, 2016

Summary:
In clang commit r268509 we started to invoke loop-unroll pass from the
driver even under -Os. However, we happen to not initialize optsize
thresholds properly, which si fixed with this change.

r268509 led to some big compile time regressions, because we started to
unroll some loops that we didn't unroll before. With this change I hope
to recover most of the regressions. We still are slightly slower than
before, because we do some checks here and there in loop-unrolling
before we bail out, but at least the slowdown is not that huge now.

Reviewers: hfinkel, chandlerc

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D23388

llvm-svn: 279585

bd63d436

Aug 18, 2016

[LoopUnroll] Move a simple check earlier. NFC. · e7877632

Haicheng Wu authored Aug 17, 2016

Move the check of CallInst earlier to skip expensive recursive operations.

Differential Revision: https://reviews.llvm.org/D23611

llvm-svn: 278998

e7877632

Aug 09, 2016

Consistently use LoopAnalysisManager · 0746f3bf

Sean Silva authored Aug 09, 2016

One exception here is LoopInfo which must forward-declare it (because
the typedef is in LoopPassManager.h which depends on LoopInfo).

Also, some includes for LoopPassManager.h were needed since that file
provides the typedef.

Besides a general consistently benefit, the extra layer of indirection
allows the mechanical part of https://reviews.llvm.org/D23256 that
requires touching every transformation and analysis to be factored out
cleanly.

Thanks to David for the suggestion.

llvm-svn: 278079

0746f3bf

Jul 29, 2016

[LoopUnroll] Include hotness of region in opt remark · 12937c36

Adam Nemet authored Jul 29, 2016

LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter
is added to the common function analysis passes that loop passes
depend on.

The BFI and indirectly BPI used in this pass is computed lazily so no
overhead should be observed unless -pass-remarks-with-hotness is used.

This is how the patch affects the O3 pipeline:

         Dominator Tree Construction
         Natural Loop Information
         Canonicalize natural loops
         Loop-Closed SSA Form Pass
         Basic Alias Analysis (stateless AA impl)
         Function Alias Analysis Results
         Scalar Evolution Analysis
+        Lazy Branch Probability Analysis
+        Lazy Block Frequency Analysis
+        Optimization Remark Emitter
         Loop Pass Manager
           Rotate Loops
           Loop Invariant Code Motion
           Unswitch loops
         Simplify the CFG
         Dominator Tree Construction
         Basic Alias Analysis (stateless AA impl)
         Function Alias Analysis Results
         Combine redundant instructions
         Natural Loop Information
         Canonicalize natural loops
         Loop-Closed SSA Form Pass
         Scalar Evolution Analysis
+        Lazy Branch Probability Analysis
+        Lazy Block Frequency Analysis
+        Optimization Remark Emitter
         Loop Pass Manager
           Induction Variable Simplification
           Recognize loop idioms
           Delete dead loops
           Unroll loops
...

llvm-svn: 277203

12937c36

Jul 20, 2016

[PM] Port LoopUnroll. · e3c18a5a

Sean Silva authored Jul 19, 2016

We just set PreserveLCSSA to always true since we don't have an
analogous method `mustPreserveAnalysisID(LCSSA)`.

Also port LoopInfo verifier pass to test LoopUnrollPass.

llvm-svn: 276063

e3c18a5a

Jun 15, 2016

[LoopUnroll] Don't crash trying to unroll loop with EH pad exit · 4a697c31

David Majnemer authored Jun 15, 2016

We do not support splitting cleanuppad or catchswitches.  This is
problematic for passes which assume that a loop is in loop simplify
form (the loop would have a dedicated exit block instead of sharing it).

While it isn't great that we don't support this for cleanups, we still
cannot make loop-simplify form an assertable precondition because
indirectbr will also disable these sorts of CFG cleanups.

This fixes PR28132.

llvm-svn: 272739

4a697c31

Jun 08, 2016

The patch set unroll disable pragma when unroll · 3e2f389a

Evgeny Stupachenko authored Jun 08, 2016

with user specified count has been applied.

Summary:
Previously SetLoopAlreadyUnrolled() set the disable pragma only if
there was some loop metadata.
Now it set the pragma in all cases. This helps to prevent multiple
unroll when -unroll-count=N is given.

Reviewers: mzolotukhin

Differential Revision: http://reviews.llvm.org/D20765

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 272195

3e2f389a

Jun 03, 2016

[LoopUnroll] Set correct thresholds for new recently enabled unrolling heuristic. · 58564989

Michael Zolotukhin authored Jun 03, 2016

In r270478, where I enabled the new heuristic I posted testing results,
which I got when explicitly passed the thresholds values via CL options.
However, setting the CL options init-values is not enough to change the
default values of thresholds, so I'm changing them in another place now.

llvm-svn: 271615

58564989

May 28, 2016

The patch fixes r271071 · b787522d

Evgeny Stupachenko authored May 28, 2016

Summary:
unused variables in Release mode:
  BasicBlock *Header
  unsigned OrigCount
put under DEBUG

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 271076

b787522d

The patch refactors unroll pass. · ea2aef4a

Evgeny Stupachenko authored May 27, 2016

Summary:
Unroll factor (Count) calculations moved to a new function.
Early exits on pragma and "-unroll-count" defined factor added.
New type of unrolling "Force" introduced (previously used implicitly).
New unroll preference "AllowRemainder" introduced and set "true" by default.
(should be set to false for architectures that suffers from it).

Reviewers: hfinkel, mzolotukhin, zzheng

Differential Revision: http://reviews.llvm.org/D19553

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 271071

ea2aef4a

May 27, 2016
- Apply clang-tidy's misc-move-constructor-init throughout LLVM. · 82de7d32
  Benjamin Kramer authored May 27, 2016
```
No functionality change intended, maybe a tiny performance improvement.

llvm-svn: 270997
```
  82de7d32
May 26, 2016

[LoopUnrollAnalyzer] Fix a crash in analyzeLoopUnrollCost. · 1ecdedad

Michael Zolotukhin authored May 26, 2016

Condition might be simplified to a Constant, but it doesn't have to be
ConstantInt, so we should dyn_cast, instead of cast.

This fixes PR27886.

llvm-svn: 270924

1ecdedad

May 25, 2016
- Re-enable "[LoopUnroll] Enable advanced unrolling analysis by default" one more time. · 8f7a242c
  Michael Zolotukhin authored May 24, 2016
```
This reverts commit r270577.

llvm-svn: 270630
```
  8f7a242c
May 24, 2016
- Revert r270518, which re-enabled "[LoopUnroll] Enable advanced unrolling analysis by default. · b64e4390
  Hans Wennborg authored May 24, 2016
```
Chromium builds are still hitting the assert in PR27874.

llvm-svn: 270577
```
  b64e4390
- Revert "Revert r270478 "[LoopUnroll] Enable advanced unrolling analysis by default."" · 96c150d1
  Michael Zolotukhin authored May 24, 2016
```
This reverts commit r270512 and reapplies r270478. Originally it caused
PR27847, but it was fixed in r270517.

llvm-svn: 270518
```
  96c150d1
- Revert r270478 "[LoopUnroll] Enable advanced unrolling analysis by default." · 6951028b
  Hans Wennborg authored May 23, 2016
```
This caused PR27847.

llvm-svn: 270512
```
  6951028b
May 23, 2016

[LoopUnroll] Enable advanced unrolling analysis by default. · be080fc5

Michael Zolotukhin authored May 23, 2016

Summary:
This patch turns on LoopUnrollAnalyzer by default. To mitigate compile
time regressions, I chose very conservative thresholds for now. Later we
can make them more aggressive, but it might require being smarter in
which loops we're optimizing. E.g. currently the biggest issue is that
with more agressive thresholds we unroll many cold loops, which
increases compile time for no performance benefit (performance of those
loops is improved, but it doesn't matter since they are cold).

Test results for compile time(using 4 samples to reduce noise):
```
MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes 5.19%
SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect  4.19%
MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow  3.39%
MultiSource/Applications/JM/lencod/lencod 1.47%
MultiSource/Benchmarks/Fhourstones-3_1/fhourstones3_1 -6.06%
```

I didn't see any performance changes in the testsuite, but it improves
some internal tests.

Reviewers: hfinkel, chandlerc

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D20482

llvm-svn: 270478

be080fc5

May 18, 2016

[LoopUnrollAnalyzer] Take into account cost of instructions controlling... · d2268a73

Michael Zolotukhin authored May 18, 2016

[LoopUnrollAnalyzer] Take into account cost of instructions controlling branches, along with their operands.

Previously, we didn't add their and their operands cost, which could've
resulted in unrolling loops for no actual benefit.

llvm-svn: 269985

d2268a73

May 13, 2016

Revert "Revert "[Unroll] Implement a conservative and monotonically increasing... · 963a6d9c

Michael Zolotukhin authored May 13, 2016

Revert "Revert "[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the...""

This reverts commit r269395.

Try to reapply with a fix from chapuni.

llvm-svn: 269486

963a6d9c

Revert "[Unroll] Implement a conservative and monotonically increasing cost... · 9be3b8b9

Michael Zolotukhin authored May 13, 2016

Revert "[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the..."

This reverts commit r269388.

It caused some bots to fail, I'm reverting it until I investigate the
issue.

llvm-svn: 269395

9be3b8b9

[Unroll] Implement a conservative and monotonically increasing cost tracking... · b7b80529

Michael Zolotukhin authored May 13, 2016

[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the...

Summary:
...loop after the last iteration.

This is really hard to do correctly. The core problem is that we need to
model liveness through the induction PHIs from iteration to iteration in
order to get the correct results, and we need to correctly de-duplicate
the common subgraphs of instructions feeding some subset of the
induction PHIs. All of this can be driven either from a side effect at
some iteration or from the loop values used after the loop finishes.

This patch implements this by storing the forward-propagating analysis
of each instruction in a cache to recall whether it was free and whether
it has become live and thus counted toward the total unroll cost. Then,
at each sink for a value in the loop, we recursively walk back through
every value that feeds the sink, including looping back through the
iterations as needed, until we have marked the entire input graph as
live. Because we cache this, we never visit instructions more than twice
-- once when we analyze them and put them into the cache, and once when
we count their cost towards the unrolled loop. Also, because the cache
is only two bits and because we are dealing with relatively small
iteration counts, we can store all of this very densely in memory to
avoid this from becoming an excessively slow analysis.

The code here is still pretty gross. I would appreciate suggestions
about better ways to factor or split this up, I've stared too long at
the algorithmic side to really have a good sense of what the design
should probably look at.

Also, it might seem like we should do all of this bottom-up, but I think
that is a red herring. Specifically, the simplification power is *much*
greater working top-down. We can forward propagate very effectively,
even across strange and interesting recurrances around the backedge.
Because we use data to propagate, this doesn't cause a state space
explosion. Doing this level of constant folding, etc, would be very
expensive to do bottom-up because it wouldn't be until the last moment
that you could collapse everything. The current solution is essentially
a top-down simplification with a bottom-up cost accounting which seems
to get the best of both worlds. It makes the simplification incremental
and powerful while leaving everything dead until we *know* it is needed.

Finally, a core property of this approach is its *monotonicity*. At all
times, the current UnrolledCost is a conservatively low estimate. This
ensures that we will never early-exit from the analysis due to exceeding
a threshold when if we had continued, the cost would have gone back
below the threshold. These kinds of bugs can cause incredibly hard to
track down random changes to behavior.

We could use a techinque similar (but much simpler) within the inliner
as well to avoid considering speculated code in the inline cost.

Reviewers: chandlerc

Subscribers: sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D11758

llvm-svn: 269388

b7b80529

May 10, 2016

Loop unroller: set thresholds for optsize and minsize functions to zero · 719b26ba

Hans Wennborg authored May 10, 2016

Before r268509, Clang would disable the loop unroll pass when optimizing
for size. That commit enabled it to be able to support unroll pragmas
in -Os builds. However, this regressed binary size in one of Chromium's
DLLs with ~100 KB.

This restores the original behaviour of no unrolling at -Os, but doing it
in LLVM instead of Clang makes more sense, and also allows the pragmas to
keep working.

Differential revision: http://reviews.llvm.org/D20115

llvm-svn: 269124

719b26ba

May 05, 2016
- clang-format some files in preparation of coming patch reviews. · d55bc4c7
  Dehao Chen authored May 05, 2016
```
llvm-svn: 268583
```
  d55bc4c7
Apr 23, 2016

Re-commit optimization bisect support (r267022) without new pass manager support. · aa641a51

Andrew Kaylor authored Apr 22, 2016

The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).

Differential Revision: http://reviews.llvm.org/D19172

llvm-svn: 267231

aa641a51

Apr 22, 2016

Revert "Initial implementation of optimization bisect support." · 6013f45f

Vedant Kumar authored Apr 22, 2016

This reverts commit r267022, due to an ASan failure:

  http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/1549

llvm-svn: 267115

6013f45f

Apr 21, 2016

Initial implementation of optimization bisect support. · f0f27929

Andrew Kaylor authored Apr 21, 2016

This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.

The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.

The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way.

Differential Revision: http://reviews.llvm.org/D19172

llvm-svn: 267022

f0f27929