Commits · e41f37d99db1e341b3cf24ed49f21b58d63c15c4 · Roger Ferrer / llvm-epi-0.8

Sep 14, 2013

Remove the long, long defunct IR block placement pass. · ebeac5cb

Chandler Carruth authored Sep 14, 2013

This pass was based on the previous (essentially unused) profiling
infrastructure and the assumption that by ordering the basic blocks at
the IR level in a particular way, the correct layout would happen in the
end. This sometimes worked, and mostly didn't. It also was a really
naive implementation of the classical paper that dates from when branch
predictors were primarily directional and when loop structure wasn't
commonly available. It also didn't factor into the equation
non-fallthrough branches and other machine level details.

Anyways, for all of these reasons and more, I wrote
MachineBlockPlacement, which completely supercedes this pass. It both
uses modern profile information infrastructure, and actually works. =]

llvm-svn: 190748

ebeac5cb

Sep 11, 2013
- Add getUnrollingPreferences to TTI · 8f2e7005
  Hal Finkel authored Sep 11, 2013
```
Allow targets to customize the default behavior of the generic loop unrolling
transformation. This will be used by the PowerPC backend when targeting the A2
core (which is in-order with a deep pipeline), and using more aggressive
defaults is important.

llvm-svn: 190542
```
  8f2e7005
- Teach loop-idiom about address space pointer sizes · 009faed1
  Matt Arsenault authored Sep 11, 2013
```
llvm-svn: 190491
```
  009faed1
- Add braces · 5df49bd7
  Matt Arsenault authored Sep 11, 2013
```
llvm-svn: 190490
```
  5df49bd7
- Get rid of unused isPodLike definitions. · 77d7fbb9
  Eli Friedman authored Sep 11, 2013
```
llvm-svn: 190461
```
  77d7fbb9
- Fix mistake in r190442. · c1f1f852
  Eli Friedman authored Sep 10, 2013
```
llvm-svn: 190446
```
  c1f1f852
- Remove unused functions. · 1891f693
  Eli Friedman authored Sep 10, 2013
```
llvm-svn: 190442
```
  1891f693
Sep 10, 2013
- Teach ScalarEvolution about pointer address spaces · a90a18e0
  Matt Arsenault authored Sep 10, 2013
```
llvm-svn: 190425
```
  a90a18e0
Sep 06, 2013
- Use type helper functions. · 8227b9f6
  Matt Arsenault authored Sep 06, 2013
```
llvm-svn: 190113
```
  8227b9f6
- Teach CodeGenPrepare about address spaces · 37d42eca
  Matt Arsenault authored Sep 06, 2013
```
llvm-svn: 190112
```
  37d42eca
Aug 29, 2013

Revert: r189565 - Add getUnrollingPreferences to TTI · 8e83820a

Hal Finkel authored Aug 29, 2013

Revert unintentional commit (of an unreviewed change).

Original commit message:

Add getUnrollingPreferences to TTI

Allow targets to customize the default behavior of the generic loop unrolling
transformation. This will be used by the PowerPC backend when targeting the A2
core (which is in-order with a deep pipeline), and using more aggressive
defaults is important.

llvm-svn: 189566

8e83820a

Add getUnrollingPreferences to TTI · 63e6c0e9

Hal Finkel authored Aug 29, 2013

Allow targets to customize the default behavior of the generic loop unrolling
transformation. This will be used by the PowerPC backend when targeting the A2
core (which is in-order with a deep pipeline), and using more aggressive
defaults is important.

llvm-svn: 189565

63e6c0e9

Aug 23, 2013

Turn MipsOptimizeMathLibCalls into a target-independent scalar transform · 37cd6cfb

Richard Sandiford authored Aug 23, 2013

...so that it can be used for z too.  Most of the code is the same.
The only real change is to use TargetTransformInfo to test when a sqrt
instruction is available.

The pass is opt-in because at the moment it only handles sqrt.

llvm-svn: 189097

37cd6cfb

Aug 14, 2013

Revert r187191, which broke opt -mem2reg on the testcases included in PR16867. · c7776f73

Nick Lewycky authored Aug 13, 2013

However, opt -O2 doesn't run mem2reg directly so nobody noticed until r188146
when SROA started sending more things directly down the PromoteMemToReg path.

In order to revert r187191, I also revert dependent revisions r187296, r187322
and r188146. Fixes PR16867. Does not add the testcases from that PR, but both
of them should get added for both mem2reg and sroa when this revert gets
unreverted.

llvm-svn: 188327

c7776f73

Aug 13, 2013
- Reapply r188119 now that the bug it exposed is fixed. · 8d642de1
  Peter Collingbourne authored Aug 12, 2013
```
llvm-svn: 188217
```
  8d642de1
Aug 11, 2013

Re-instate r187323 which fast-tracks promotable allocas as soon as the · d7cd7e36

Chandler Carruth authored Aug 11, 2013

SROA-based analysis has enough information. This should work now that
both mem2reg *and* the SSAUpdater-based AllocaPromoter have been updated
to be able to promote the types of allocas that the SROA analysis
detects.

I've included tests for the AllocaPromoter that were only possible to
write once we fast-tracked promotable allocas without rewriting them.
This includes a test both for r187347 and r188145.

Original commit log for r187323:
"""
Now that mem2reg understands how to cope with a slightly wider set of uses of
an alloca, we can pre-compute promotability while analyzing an alloca for
splitting in SROA. That lets us short-circuit the common case of a bunch of
trivially promotable allocas. This cuts 20% to 30% off the run time of SROA for
typical frontend-generated IR sequneces I'm seeing. It gets the new SROA to
within 20% of ScalarRepl for such code. My current benchmark for these numbers
is PR15412, but it fits the general pattern of IR emitted by Clang so it should
be widely applicable.
"""

llvm-svn: 188146

d7cd7e36

Finish fixing the SSAUpdater-based AllocaPromoter strategy in SROA to cope with · c17283b4

Chandler Carruth authored Aug 11, 2013

the more general set of patterns that are now handled by mem2reg and that we
can detect quickly while doing SROA's initial analysis. Notably, this allows it
to promote through no-op bitcast and GEP sequences. A core part of the
SSAUpdater approach is the ability to test whether a particular instruction is
part of the set being promoted. Testing this becomes significantly more complex
in the world where the operand to every load and store isn't the alloca itself.
I ended up using the approach of walking up the def-chain until we find the
alloca. I benchmarked this against keeping a set of pointer operands and
keeping a set of the loads and stores we care about, and this one seemed faster
although the difference was very small.

No test case yet because currently the rewriting always "fixes" the inputs to
not require this. The next patch which re-enables early promotion of easy cases
in SROA will include a test case that specifically exercises this aspect of the
alloca promoter.

llvm-svn: 188145

c17283b4

Reformat some bits of AllocaPromoter and simplify the name and type of · 45b136f4

Chandler Carruth authored Aug 11, 2013

our visiting datastructures in the AllocaPromoter/SSAUpdater path of
SROA. Also shift the order if clears around to be more consistent.

No functionality changed here, this is just a cleanup.

llvm-svn: 188144

45b136f4

Aug 10, 2013

Revert r188119 "Kill some duplicated code for removing unreachable BBs." · 3dcdb89d

Arnold Schwaighofer authored Aug 10, 2013

It is breaking builbots with libgmalloc enabled on Mac OS X.

$ cd llvm ; mkdir release ; cd release
$ ../configure --enable-optimized —prefix=$PWD/install
$ make
$ make check
$ Release+Asserts/bin/llvm-lit -v --param use_gmalloc=1 --param \
  gmalloc_path=/usr/lib/libgmalloc.dylib \
  ../test/Instrumentation/DataFlowSanitizer/args-unreachable-bb.ll

llvm-svn: 188142

3dcdb89d

Kill some duplicated code for removing unreachable BBs. · 32090aba

Peter Collingbourne authored Aug 09, 2013

This moves removeUnreachableBlocksFromFn from SimplifyCFGPass.cpp
to Utils/Local.cpp and uses it to replace the implementation of
llvm::removeUnreachableBlocks, which appears to do a strict subset
of what removeUnreachableBlocksFromFn does.

Differential Revision: http://llvm-reviews.chandlerc.com/D1334

llvm-svn: 188119

32090aba

Aug 07, 2013

JumpThreading: Turn a select instruction into branching if it allows to thread... · 6a4976d3

Benjamin Kramer authored Aug 07, 2013

JumpThreading: Turn a select instruction into branching if it allows to thread one half of the select.

This is a common pattern coming out of simplifycfg generating gross code.

a:                                       ; preds = %entry
  %sel = select i1 %cmp1, double %add, double 0.000000e+00
  br label %b

b:
  %cond5 = phi double [ %sel, %a ], [ %sub, %entry ]
  %cmp6 = fcmp oeq double %cond5, 0.000000e+00
  br i1 %cmp6, label %if.then, label %if.end

becomes

a:
  br i1 %cmp1, label %b, label %if.then

b:
  %cond5 = phi double [ %sub, %entry ], [ %add, %a ]
  %cmp6 = fcmp oeq double %cond5, 0.000000e+00
  br i1 %cmp6, label %if.then, label %if.end

Skipping block b completely if possible.

llvm-svn: 187880

6a4976d3

Aug 06, 2013
- Adjust file to the coding standard. · 27da123d
  Jakub Staszak authored Aug 06, 2013
```
llvm-svn: 187808
```
  27da123d
- Factor FlattenCFG out from SimplifyCFG · aa664d9b
  Tom Stellard authored Aug 06, 2013
```
Patch by: Mei Ye

llvm-svn: 187764
```
  aa664d9b
Jul 29, 2013

Teach the AllocaPromoter which is wrapped around the SSAUpdater · cd7c8cdf

Chandler Carruth authored Jul 29, 2013

infrastructure to do promotion without a domtree the same smarts about
looking through GEPs, bitcasts, etc., that I just taught mem2reg about.
This way, if SROA chooses to promote an alloca which still has some
noisy instructions this code can cope with them.

I've not used as principled of an approach here for two reasons:
1) This code doesn't really need it as we were already set up to zip
   through the instructions used by the alloca.
2) I view the code here as more of a hack, and hopefully a temporary one.

The SSAUpdater path in SROA is a real sore point for me. It doesn't make
a lot of architectural sense for many reasons:
- We're likely to end up needing the domtree anyways in a subsequent
  pass, so why not compute it earlier and use it.
- In the future we'll likely end up needing the domtree for parts of the
  inliner itself.
- If we need to we could teach the inliner to preserve the domtree. Part
  of the re-work of the pass manager will allow this to be very powerful
  even in large SCCs with many functions.
- Ultimately, computing a domtree has gotten significantly faster since
  the original SSAUpdater-using code went into ScalarRepl. We no longer
  use domfrontiers, and much of domtree is lazily done based on queries
  rather than eagerly.
- At this point keeping the SSAUpdater-based promotion saves a total of
  0.7% on a build of the 'opt' tool for me. That's not a lot of
  performance given the complexity!

So I'm leaving this a bit ugly in the hope that eventually we just
remove all of this nonsense.

I can't even readily test this because this code isn't reachable except
through SROA. When I re-instate the patch that fast-tracks allocas
already suitable for promotion, I'll add a testcase there that failed
before this change. Before that, SROA will fix any test case I give it.

llvm-svn: 187347

cd7c8cdf

Jul 28, 2013

Temporarily revert r187323 until I update SSAUpdater to match mem2reg. · d31370e0
Chandler Carruth authored Jul 28, 2013
```
I forgot that we had two totally independent things here. :: sigh ::

llvm-svn: 187327
```
d31370e0

Now that mem2reg understands how to cope with a slightly wider set of · 9d96100f

Chandler Carruth authored Jul 28, 2013

uses of an alloca, we can pre-compute promotability while analyzing an
alloca for splitting in SROA. That lets us short-circuit the common case
of a bunch of trivially promotable allocas. This cuts 20% to 30% off the
run time of SROA for typical frontend-generated IR sequneces I'm seeing.
It gets the new SROA to within 20% of ScalarRepl for such code. My
current benchmark for these numbers is PR15412, but it fits the general
pattern of IR emitted by Clang so it should be widely applicable.

llvm-svn: 187323

9d96100f

Thread DataLayout through the callers and into mem2reg. This will be · d5b806a2

Chandler Carruth authored Jul 28, 2013

useful in a subsequent patch, but causes an unfortunate amount of noise,
so I pulled it out into a separate patch.

llvm-svn: 187322

d5b806a2

Jul 27, 2013

Don't use all the #ifdefs to hide the stats counters and instead rely on · 8e3c4dc5

Chandler Carruth authored Jul 27, 2013

their being optimized out in debug mode. Realistically, this just isn't
going to be the slow part anyways. This also fixes unused variable
warnings that are breaking LLD build bots. =/ I didn't see these at
first, and kept losing track of the fact that they were broken.

llvm-svn: 187297

8e3c4dc5

Reimplement isPotentiallyReachable to make nocapture deduction much stronger. · 0b68245e

Nick Lewycky authored Jul 27, 2013

Adds unit tests for it too.

Split BasicBlockUtils into an analysis-half and a transforms-half, and put the
analysis bits into a new Analysis/CFG.{h,cpp}. Promote isPotentiallyReachable
into llvm::isPotentiallyReachable and move it into Analysis/CFG.

llvm-svn: 187283

0b68245e

SimplifyCFG: Use parallel-and and parallel-or mode to consolidate branch conditions · 8b1e021e

Tom Stellard authored Jul 27, 2013

Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches.  The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.

Patch by: Mei Ye

llvm-svn: 187278

8b1e021e

Jul 24, 2013

TRE: Move class into anonymous namespace. · 328da33d
Benjamin Kramer authored Jul 24, 2013
```
While there shrink a dangerously large SmallPtrSet.

llvm-svn: 187050
```
328da33d

Fix a problem I introduced in r187029 where we would over-eagerly · 58e25d39

Chandler Carruth authored Jul 24, 2013

schedule an alloca for another iteration in SROA. This only showed up
with a mixture of promotable and unpromotable selects and phis. Added
a test case for this.

llvm-svn: 187031

58e25d39

Fix PR16687 where we were incorrectly promoting an alloca that had · 83ea195d

Chandler Carruth authored Jul 24, 2013

pending speculation for a phi node. The problem here is that we were
using growth of the specluation set as an indicator of whether
speculation would occur, and if the phi node is already in the set we
don't see it grow. This is a symptom of the fact that this signal is
a total hack.

Unfortunately, I couldn't really come up with a non-hacky way of
signaling that promotion remains valid *after* speculation occurs, such
that we only speculate when all else looks good for promotion. In the
end, I went with at least a much more explicit approach of doing the
work of queuing inside the phi and select processing and setting
a preposterously named flag to convey that we're in the special state of
requiring speculating before promotion.

Thanks to Richard Trieu and Nick Lewycky for the excellent work reducing
a testcase for this from a pretty giant, nasty assert in a big
application. =] The testcase was excellent.

llvm-svn: 187029

83ea195d

Jul 23, 2013
- Remove extraneous null statement. No functionality change! · 6ab9d936
  Nick Lewycky authored Jul 22, 2013
```
llvm-svn: 186893
```
  6ab9d936
- Use switch instead of if. No functionality change. · d4d94065
  Jakub Staszak authored Jul 22, 2013
```
llvm-svn: 186892
```
  d4d94065
- OldPtr is llvm::Instruction. Remove unneeded cast<>. · cb132fac
  Jakub Staszak authored Jul 22, 2013
```
llvm-svn: 186880
```
  cb132fac
Jul 22, 2013
- Change tabs to spaces. · 6b36db08
  Jakub Staszak authored Jul 22, 2013
```
llvm-svn: 186877
```
  6b36db08
- Fix spelling and grammar · fb183238
  Matt Arsenault authored Jul 22, 2013
```
llvm-svn: 186858
```
  fb183238
Jul 20, 2013
- SROA: Microoptimization: Remove dead entries first, then sort. · 08e5070b
  Benjamin Kramer authored Jul 20, 2013
```
While there replace an explicit struct with std::mem_fun.

llvm-svn: 186761
```
  08e5070b
Jul 19, 2013
- Cleanup the stats counters for the new implementation. These actually · 6c321c13
  Chandler Carruth authored Jul 19, 2013
```
count the right things and have the right names.

llvm-svn: 186667
```
  6c321c13