Commits · 935125126c4ffde74ef3a82959fde69ee8c6b648 · Roger Ferrer / llvm-epi-0.8

Feb 25, 2014

Make DataLayout a plain object, not a pass. · 93512512

Rafael Espindola authored Feb 25, 2014

Instead, have a DataLayoutPass that holds one. This will allow parts of LLVM
don't don't handle passes to also use DataLayout.

llvm-svn: 202168

93512512

Factor out calls to AA.getDataLayout(). · 6d6e87be
Rafael Espindola authored Feb 25, 2014
```
llvm-svn: 202157
```
6d6e87be

[SROA] Use the original load name with the SROA-prefixed IRB rather than · 25adb7b0

Chandler Carruth authored Feb 25, 2014

just "load". This helps avoid pointless de-duping with order-sensitive
numbers as we already have unique names from the original load. It also
makes the resulting IR quite a bit easier to read.

llvm-svn: 202140

25adb7b0

[SROA] Thread the ability to add a pointer-specific name prefix through · cb93cd2d

Chandler Carruth authored Feb 25, 2014

the pointer adjustment code. This is the primary code path that creates
totally new instructions in SROA and being able to lump them based on
the pointer value's name for which they were created causes
*significantly* fewer name collisions and general noise in the debug
output. This is particularly significant because it is making it much
harder to track down instability in the output of SROA, as name
de-duplication is a totally harmless form of instability that gets in
the way of seeing real problems.

The new fancy naming scheme tries to dig out the root "pre-SROA" name
for pointer values and associate that all the way through the pointer
formation instructions. Digging out the root is important to prevent the
multiple iterative rounds of SROA from just layering too much cruft on
top of cruft here. We already track the layers of SROAs iteration in the
alloca name prefix. We don't need to duplicate it here.

Should have no functionality change, and shouldn't have any really
measurable impact on NDEBUG builds, as most of the complex logic is
debug-only.

llvm-svn: 202139

cb93cd2d

[SROA] Rather than copying the logic for building a name prefix into the · 51175533
Chandler Carruth authored Feb 25, 2014
```
PHI-pointer builder, just copy the builder and clobber the obvious
fields.

llvm-svn: 202136
```
51175533

[SROA] Simplify some of the logic to dig out the old pointer value by · 8183a50f

Chandler Carruth authored Feb 25, 2014

using OldPtr more heavily. Lots of this code was written before the
rewriter had an OldPtr member setup ahead of time. There are already
asserts in place that should ensure this doesn't change any
functionality.

llvm-svn: 202135

8183a50f

[SROA] Adjust to new clang-format style. · 7625c54e
Chandler Carruth authored Feb 25, 2014
```
llvm-svn: 202134
```
7625c54e

[SROA] Fix a *glaring* bug in r202091: you have to actually *write* · a8c4cc68

Chandler Carruth authored Feb 25, 2014

the break statement, not just think it to yourself....

No idea how this worked at all, much less survived most bots, my
bootstrap, and some bot bootstraps!

The Polly one didn't survive, and this was filed as PR18959. I don't
have a reduced test case and honestly I'm not seeing the need. What we
probably need here are better asserts / debug-build behavior in
SmallPtrSet so that this madness doesn't make it so far.

llvm-svn: 202129

a8c4cc68

Silence GCC warning · 26af6f7f
Alexey Samsonov authored Feb 25, 2014
```
llvm-svn: 202119
```
26af6f7f
Fix typos · 70b36995
Alp Toker authored Feb 25, 2014
```
llvm-svn: 202107
```
70b36995

[SROA] Add a debugging tool which shuffles the slices sequence prior to · 83cee772

Chandler Carruth authored Feb 25, 2014

sorting it. This helps uncover latent reliance on the original ordering
which aren't guaranteed to be preserved by std::sort (but often are),
and which are based on the use-def chain orderings which also aren't
(technically) guaranteed.

Only available in C++11 debug builds, and behind a flag to prevent noise
at the moment, but this is generally useful so figured I'd put it in the
tree rather than keeping it out-of-tree.

llvm-svn: 202106

83cee772

[SROA] Use a more direct way of determining whether we are processing · bb2a9324

Chandler Carruth authored Feb 25, 2014

the destination operand or source operand of a memmove.

It so happens that it was impossible for SROA to try to rewrite
self-memmove where the operands are *identical*, because either such
a think is volatile (and we don't rewrite) or it is non-volatile, and we
don't even register it as a use of the alloca.

However, making the 'IsDest' test *rely* on this subtle fact is... Very
confusing for the reader. We should use the direct and readily available
test of the Use* which gives us concrete information about which operand
is being rewritten.

No functionality changed, I hope! ;]

llvm-svn: 202103

bb2a9324

[SROA] Fix another instability in SROA with respect to the slice · 3bf18ed5

Chandler Carruth authored Feb 25, 2014

ordering.

The fundamental problem that we're hitting here is that the use-def
chain ordering is *itself* not a stable thing to be relying on in the
rewriting for SROA. Further, we use a non-stable sort over the slices to
arrange them based on the section of the alloca they're operating on.
With a debugging STL implementation (or different implementations in
stage2 and stage3) this can cause stage2 != stage3.

The specific aspect of this problem fixed in this commit deals with the
rewriting and load-speculation around PHIs and Selects. This, like many
other aspects of the use-rewriting in SROA, is really part of the
"strong SSA-formation" that is doen by SROA where it works very hard to
canonicalize loads and stores in *just* the right way to satisfy the
needs of mem2reg[1]. When we have a select (or a PHI) with 2 uses of the
same alloca, we test that loads downstream of the select are
speculatable around it twice. If only one of the operands to the select
needs to be rewritten, then if we get lucky we rewrite that one first
and the select is immediately speculatable. This can cause the order of
operand visitation, and thus the order of slices to be rewritten, to
change an alloca from promotable to non-promotable and vice versa.

The fix is to defer all of the speculation until *after* the rewrite
phase is done. Once we've rewritten everything, we can accurately test
for whether speculation will work (once, instead of twice!) and the
order ceases to matter.

This also happens to simplify the other subtlety of speculation -- we
need to *not* speculate anything unless the result of speculating will
make the alloca fully promotable by mem2reg. I had a previous attempt at
simplifying this, but it was still pretty horrible.

There is actually already a *really* nice test case for this in
basictest.ll, but on multiple STL implementations and inputs, we just
got "lucky". Fortunately, the test case is very small and we can
essentially build it in exactly the opposite way to get reasonable
coverage in both directions even from normal STL implementations.

llvm-svn: 202092

3bf18ed5

Make some DataLayout pointers const. · aeff8a9c
Rafael Espindola authored Feb 24, 2014
```
No functionality change. Just reduces the noise of an upcoming patch.

llvm-svn: 202087
```
aeff8a9c

Feb 22, 2014

Include <cctype> for isdigit(). · 61c6df03
Logan Chien authored Feb 22, 2014
```
llvm-svn: 201930
```
61c6df03

[CodeGenPrepare] Move CodeGenPrepare into lib/CodeGen. · a349084a

Quentin Colombet authored Feb 22, 2014

CodeGenPrepare uses extensively TargetLowering which is part of libLLVMCodeGen.
This is a layer violation which would introduce eventually a dependence on
CodeGen in ScalarOpts.

Move CodeGenPrepare into libLLVMCodeGen to avoid that.

Follow-up of <rdar://problem/15519855>

llvm-svn: 201912

a349084a

Feb 21, 2014
- Rename a few more DataLayout variables from TD to DL. · 5f57f462
  Rafael Espindola authored Feb 21, 2014
```
llvm-svn: 201870
```
  5f57f462
- Rename a few more DataLayout variables. · 612886fc
  Rafael Espindola authored Feb 21, 2014
```
llvm-svn: 201833
```
  612886fc
- Rename many DataLayout variables from TD to DL. · 37dc9e19
  Rafael Espindola authored Feb 21, 2014
```
I am really sorry for the noise, but the current state where some parts of the
code use TD (from the old name: TargetData) and other parts use DL makes it
hard to write a patch that changes where those variables come from and how
they are passed along.

llvm-svn: 201827
```
  37dc9e19
Feb 19, 2014

X86 CodeGenPrep: sink shufflevectors before shifts · aeb8e06d

Tim Northover authored Feb 19, 2014

On x86, shifting a vector by a scalar is significantly cheaper than shifting a
vector by another fully general vector. Unfortunately, because SelectionDAG
operates on just one basic block at a time, the shufflevector instruction that
reveals whether the right-hand side of a shift *is* really a scalar is often
not visible to CodeGen when it's needed.

This adds another handler to CodeGenPrepare, to sink any useful shufflevector
instructions down to the basic block where they're used, predicated on a target
hook (since on other architectures, doing so will often just introduce extra
real work).

rdar://problem/16063505

llvm-svn: 201655

aeb8e06d

Feb 18, 2014

GlobalMerge: move "-global-merge" option to the pass itself. · f804c178

Tim Northover authored Feb 18, 2014

It's rather odd to have the flag enabling and disabling this pass only affect a
single target.

llvm-svn: 201559

f804c178

Feb 14, 2014

[CodeGenPrepare][AddressingModeMatcher] Give up on type promotion if the · 867c5509
Quentin Colombet authored Feb 14, 2014
```
transformation does not bring any immediate benefits and introduce an illegal
operation. 

llvm-svn: 201439
```
867c5509

Trivial cleanup: reuse existing variable. · 8eee97dd

Rafael Espindola authored Feb 14, 2014

Extracted while trying to understand http://llvm-reviews.chandlerc.com/D1764.

Patch by Matt Arsenault.

llvm-svn: 201425

8eee97dd

Feb 11, 2014

[LPM] Switch LICM to actively use LCSSA in addition to preserving it. · fc25854b

Chandler Carruth authored Feb 11, 2014

Fixes PR18753 and PR18782.

This is necessary for LICM to preserve LCSSA correctly and efficiently.
There is still some active discussion about whether we should be using
LCSSA, but we can't just immediately stop using it and we *need* LICM to
preserve it while we are using it. We can restore the old SSAUpdater
driven code if and when there is a serious effort to remove the reliance
on LCSSA from all of the loop passes.

However, this also serves as a great example of why LCSSA is very nice
to have. This change significantly simplifies the process of sinking
instructions for LICM, and makes it quite a bit less expensive.

It wouldn't even be as complex as it is except that I had to start the
process of removing the big recursive LCSSA formation hammer in order to
switch even this much of the re-forming code to asserting that LCSSA was
preserved. I'll fully remove that next just to tidy things up until the
LCSSA debate settles one way or the other.

llvm-svn: 201148

fc25854b

[CodeGenPrepare] Undo changes that happened for the profitability check. · 5a69dda9

Quentin Colombet authored Feb 11, 2014

The addressing mode matcher checks at some point the profitability of folding an
instruction into the addressing mode. When the instruction to be folded has
several uses, it checks that the instruction can be folded in each use.
To do so, it creates a new matcher for each use and check if the instruction is
in the list of the matched instructions of this new matcher.

The new matchers may promote some instructions and this has to be undone to keep
the state of the original matcher consistent.

A test case will follow.

<rdar://problem/16020230>

llvm-svn: 201121

5a69dda9

Feb 10, 2014
- Make succ_iterator a real random access iterator and clean up a couple of users. · 3c29c070
  Benjamin Kramer authored Feb 10, 2014
```
llvm-svn: 201088
```
  3c29c070
Feb 08, 2014

[Constant Hoisting] Fix insertion point for constant materialization. · 9479b31f

Juergen Ributzka authored Feb 08, 2014

The bitcast instruction during constant materialization was not placed correcly
in the presence of phi nodes. This commit fixes the insertion point to be in the
idom instead.

This fixes PR18768

llvm-svn: 201009

9479b31f

[Constant Hoisting] Don't update the use list while traversing it - DOH! · 4c8a0252

Juergen Ributzka authored Feb 08, 2014

This fix first traverses the whole use list of the constant expression and
keeps track of the instructions that need to be updated. Then perform the
fixup afterwards.

llvm-svn: 201008

4c8a0252

Feb 06, 2014

[CodeGenPrepare] Move away sign extensions that get in the way of addressing · 3a4bf040

Quentin Colombet authored Feb 06, 2014

mode.

Basically the idea is to transform code like this:
%idx = add nsw i32 %a, 1
%sextidx = sext i32 %idx to i64
%gep = gep i8* %myArray, i64 %sextidx
load i8* %gep

Into:
%sexta = sext i32 %a to i64
%idx = add nsw i64 %sexta, 1
%gep = gep i8* %myArray, i64 %idx
load i8* %gep

That way the computation can be folded into the addressing mode.

This transformation is done as part of the addressing mode matcher.
If the matching fails (not profitable, addressing mode not legal, etc.), the
matcher will revert the related promotions.

<rdar://problem/15519855>

llvm-svn: 200947

3a4bf040

A memcpy out of an fresh alloca is a no-op, delete it. Patch by Patrick Walton! · 99384949
Nick Lewycky authored Feb 06, 2014
```
llvm-svn: 200907
```
99384949

Disable most IR-level transform passes on functions marked 'optnone'. · af4e64d0

Paul Robinson authored Feb 06, 2014

Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.

llvm-svn: 200892

af4e64d0

Feb 04, 2014

cleanup: scc_iterator consumers should use isAtEnd · 8e661efc

Duncan P. N. Exon Smith authored Feb 04, 2014

No functional change.  Updated loops from:

    for (I = scc_begin(), E = scc_end(); I != E; ++I)

to:

    for (I = scc_begin(); !I.isAtEnd(); ++I)

for teh win.

llvm-svn: 200789

8e661efc

Self-memcpy-elision and memcpy of constant byte to memset transforms don't... · 00703e76

Nick Lewycky authored Feb 04, 2014

Self-memcpy-elision and memcpy of constant byte to memset transforms don't care how many bytes you were trying to transfer. Sink that safety test after those transforms. Noticed by inspection.

llvm-svn: 200726

00703e76

Feb 01, 2014

[LPM] Apply a really big hammer to fix PR18688 by recursively reforming · 1665152c

Chandler Carruth authored Feb 01, 2014

LCSSA when we promote to SSA registers inside of LICM.

Currently, this is actually necessary. The promotion logic in LICM uses
SSAUpdater which doesn't understand how to place LCSSA PHI nodes.
Teaching it to do so would be a very significant undertaking. It may be
worthwhile and I've left a FIXME about this in the code as well as
starting a thread on llvmdev to try to figure out the right long-term
solution.

For now, the PR needs to be fixed. Short of using the promition
SSAUpdater to place both the LCSSA PHI nodes and the promoted PHI nodes,
I don't see a cleaner or cheaper way of achieving this. Fortunately,
LCSSA is relatively lazy and sparse -- it should only update
instructions which need it. We can also skip the recursive variant when
we don't promote to SSA values.

llvm-svn: 200612

1665152c

Jan 29, 2014

[LPM] Fix PR18643, another scary place where loop transforms failed to · d4be9dc0

Chandler Carruth authored Jan 29, 2014

preserve loop simplify of enclosing loops.

The problem here starts with LoopRotation which ends up cloning code out
of the latch into the new preheader it is buidling. This can create
a new edge from the preheader into the exit block of the loop which
breaks LoopSimplify form. The code tries to fix this by splitting the
critical edge between the latch and the exit block to get a new exit
block that only the latch dominates. This sadly isn't sufficient.

The exit block may be an exit block for multiple nested loops. When we
clone an edge from the latch of the inner loop to the new preheader
being built in the outer loop, we create an exiting edge from the outer
loop to this exit block. Despite breaking the LoopSimplify form for the
inner loop, this is fine for the outer loop. However, when we split the
edge from the inner loop to the exit block, we create a new block which
is in neither the inner nor outer loop as the new exit block. This is
a predecessor to the old exit block, and so the split itself takes the
outer loop out of LoopSimplify form. We need to split every edge
entering the exit block from inside a loop nested more deeply than the
exit block in order to preserve all of the loop simplify constraints.

Once we try to do that, a problem with splitting critical edges
surfaces. Previously, we tried a very brute force to update LoopSimplify
form by re-computing it for all exit blocks. We don't need to do this,
and doing this much will sometimes but not always overlap with the
LoopRotate bug fix. Instead, the code needs to specifically handle the
cases which can start to violate LoopSimplify -- they aren't that
common. We need to see if the destination of the split edge was a loop
exit block in simplified form for the loop of the source of the edge.
For this to be true, all the predecessors need to be in the exact same
loop as the source of the edge being split. If the dest block was
originally in this form, we have to split all of the deges back into
this loop to recover it. The old mechanism of doing this was
conservatively correct because at least *one* of the exiting blocks it
rewrote was the DestBB and so the DestBB's predecessors were fixed. But
this is a much more targeted way of doing it. Making it targeted is
important, because ballooning the set of edges touched prevents
LoopRotate from being able to split edges *it* needs to split to
preserve loop simplify in a coherent way -- the critical edge splitting
would sometimes find the other edges in need of splitting but not
others.

Many, *many* thanks for help from Nick reducing these test cases
mightily. And helping lots with the analysis here as this one was quite
tricky to track down.

llvm-svn: 200393

d4be9dc0

[LPM] Fix PR18642, a pretty nasty bug in IndVars that "never mattered" · 66f0b163

Chandler Carruth authored Jan 29, 2014

because of the inside-out run of LoopSimplify in the LoopPassManager and
the fact that LoopSimplify couldn't be "preserved" across two
independent LoopPassManagers.

Anyways, in that case, IndVars wasn't correctly preserving an LCSSA PHI
node because it thought it was rewriting (via SCEV) the incoming value
to a loop invariant value. While it may well be invariant for the
current loop, it may be rewritten in terms of an enclosing loop's
values. This in and of itself is fine, as the LCSSA PHI node in the
enclosing loop for the inner loop value we're rewriting will have its
own LCSSA PHI node if used outside of the enclosing loop. With me so
far?

Well, the current loop and the enclosing loop may share an exiting
block and exit block, and when they do they also share LCSSA PHI nodes.
In this case, its not valid to RAUW through the LCSSA PHI node.

Expected crazy test included.

llvm-svn: 200372

66f0b163

Jan 28, 2014

Update optimization passes to handle inalloca arguments · 26af2cae

Reid Kleckner authored Jan 28, 2014

Summary:
I searched Transforms/ and Analysis/ for 'ByVal' and updated those call
sites to check for inalloca if appropriate.

I added tests for any change that would allow an optimization to fire on
inalloca.

Reviewers: nlewycky

Differential Revision: http://llvm-reviews.chandlerc.com/D2449

llvm-svn: 200281

26af2cae

Jan 27, 2014
- ConstantHoisting: We can't insert instructions directly in front of a PHI node. · 9e709bce
  Benjamin Kramer authored Jan 27, 2014
```
Insert before the terminating instruction of the dominating block instead.

llvm-svn: 200218
```
  9e709bce
Jan 25, 2014

[LPM] Make LCSSA a utility with a FunctionPass that applies it to all · 8765cf70

Chandler Carruth authored Jan 25, 2014

the loops in a function, and teach LICM to work in the presance of
LCSSA.

Previously, LCSSA was a loop pass. That made passes requiring it also be
loop passes and unable to depend on function analysis passes easily. It
also caused outer loops to have a different "canonical" form from inner
loops during analysis. Instead, we go into LCSSA form and preserve it
through the loop pass manager run.

Note that this has the same problem as LoopSimplify that prevents
enabling its verification -- loop passes which run at the end of the loop
pass manager and don't preserve these are valid, but the subsequent loop
pass runs of outer loops that do preserve this pass trigger too much
verification and fail because the inner loop no longer verifies.

The other problem this exposed is that LICM was completely unable to
handle LCSSA form. It didn't preserve it and it actually would give up
on moving instructions in many cases when they were used by an LCSSA phi
node. I've taught LICM to support detecting LCSSA-form PHI nodes and to
hoist and sink around them. This may actually let LICM fire
significantly more because we put everything into LCSSA form to rotate
the loop before running LICM. =/ Now LICM should handle that fine and
preserve it correctly. The down side is that LICM has to require LCSSA
in order to preserve it. This is just a fact of life for LCSSA. It's
entirely possible we should completely remove LCSSA from the optimizer.

The test updates are essentially accomodating LCSSA phi nodes in the
output of LICM, and the fact that we now completely sink every
instruction in ashr-crash below the loop bodies prior to unrolling.

With this change, LCSSA is computed only three times in the pass
pipeline. One of them could be removed (and potentially a SCEV run and
a separate LoopPassManager entirely!) if we had a LoopPass variant of
InstCombine that ran InstCombine on the loop body but refused to combine
away LCSSA PHI nodes. Currently, this also prevents loop unrolling from
being in the same loop pass manager is rotate, LICM, and unswitch.

There is one thing that I *really* don't like -- preserving LCSSA in
LICM is quite expensive. We end up having to re-run LCSSA twice for some
loops after LICM runs because LICM can undo LCSSA both in the current
loop and the parent loop. I don't really see good solutions to this
other than to completely move away from LCSSA and using tools like
SSAUpdater instead.

llvm-svn: 200067

8765cf70

Revert "Revert "Add Constant Hoisting Pass" (r200034)" · f26beda7

Juergen Ributzka authored Jan 25, 2014

This reverts commit r200058 and adds the using directive for
ARMTargetTransformInfo to silence two g++ overload warnings.

llvm-svn: 200062

f26beda7