Commits · 31f7d460fab75a107fe8f7cbcdfe77582362f1d2 · Roger Ferrer / llvm-epi-0.8

Jan 29, 2011
- Unbreak the MSVC build. · 326e4a29
  Francois Pichet authored Jan 29, 2011
```
The DEBUG() call at line 606 demands to see raw_ostream's definition. I have no idea why this seems to only break MSVC.

llvm-svn: 124545
```
  326e4a29
- Add a test for TCE return duplication. · 73c29178
  Evan Cheng authored Jan 29, 2011
```
llvm-svn: 124527
```
  73c29178
- Re-apply r124518 with fix. Watch out for invalidated iterator. · d983eba7
  Evan Cheng authored Jan 29, 2011
```
llvm-svn: 124526
```
  d983eba7
- Revert r124518. It broke Linux self-host. · 65b8ccf6
  Evan Cheng authored Jan 29, 2011
```
llvm-svn: 124522
```
  65b8ccf6
- Re-commit r124462 with fixes. Tail recursion elim will now dup ret into... · d4eff314
  Evan Cheng authored Jan 29, 2011
```
Re-commit r124462 with fixes. Tail recursion elim will now dup ret into unconditional predecessor to enable TCE on demand.

llvm-svn: 124518
```
  d4eff314
Jan 26, 2011

Fix PR9039, a use-after-free in reassociate. The issue was that the · 69bdb585

Duncan Sands authored Jan 26, 2011

operand being factorized (and erased) could occur several times in Ops,
resulting in freed memory being used when the next occurrence in Ops was
analyzed.

llvm-svn: 124287

69bdb585

Jan 24, 2011

Give GetUnderlyingObject a TargetData, to keep it in sync · 0f124e19

Dan Gohman authored Jan 24, 2011

with BasicAA's DecomposeGEPExpression, which recently began
using a TargetData. This fixes PR8968, though the testcase
is awkward to reduce.

Also, update several off GetUnderlyingObject's users
which happen to have a TargetData handy to pass it in.

llvm-svn: 124134

0f124e19

enhance SRoA to promote allocas that are used by PHI nodes. This often · d83e7b0f

Chris Lattner authored Jan 24, 2011

occurs because instcombine sinks loads and inserts phis.  This kicks in 
on such apps as 175.vpr, eon, 403.gcc, xalancbmk and a bunch of times in
spec2006 in some app that uses std::deque.

This resolves the last of rdar://7339113.

llvm-svn: 124090

d83e7b0f

Jan 23, 2011

Enhance SRoA to promote allocas that are used by selects in some · a960725d

Chris Lattner authored Jan 23, 2011

common cases.  This triggers a surprising number of times in SPEC2K6
because min/max idioms end up doing this.  For example, code from the
STL ends up looking like this to SRoA:

  %202 = load i64* %__old_size, align 8, !tbaa !3
  %203 = load i64* %__old_size, align 8, !tbaa !3
  %204 = load i64* %__n, align 8, !tbaa !3
  %205 = icmp ult i64 %203, %204
  %storemerge.i = select i1 %205, i64* %__n, i64* %__old_size
  %206 = load i64* %storemerge.i, align 8, !tbaa !3

We can now promote both the __n and the __old_size allocas.

This addresses another chunk of rdar://7339113, poor codegen on
stringswitch.

llvm-svn: 124088

a960725d

Enhance SRoA to be more aggressive about scalarization of aggregate allocas · 9491dee2

Chris Lattner authored Jan 23, 2011

that have PHI or select uses of their element pointers.  This can often happen
when instcombine sinks two loads into a successor, inserting a phi or select.

With this patch, we can scalarize the alloca, but the pinned elements are not
yet promoted.  This is still a win for large aggregates where only one element
is used.  This fixes rdar://8904039 and part of rdar://7339113 (poor codegen
on stringswitch).

llvm-svn: 124070

9491dee2

have AllocaInfo store the alloca being inspected, simplifying callers. · 8acbb795
Chris Lattner authored Jan 23, 2011
```
No functionality change.

llvm-svn: 124067
```
8acbb795

Rearrange some code a bit. Change MarkUnsafe to · 3e56c290

Chris Lattner authored Jan 23, 2011

handle the "Transformation preventing inst" printing, 
so that -scalarrepl -debug will always print the rejected
instruction.  No functionality change.

llvm-svn: 124066

3e56c290

remove an old hack that avoided creating MMX datatypes. The · a587ab7b
Chris Lattner authored Jan 23, 2011
```
X86 backend has been fixed.

llvm-svn: 124064
```
a587ab7b

Jan 21, 2011
- Actually check memcpy lengths, instead of just commenting about · 19e30d5a
  Dan Gohman authored Jan 21, 2011
```
how they should be checked.

llvm-svn: 123999
```
  19e30d5a
- SCCP doesn't actually preserve the CFG. It will delete and insert terminator · ae0275e0
  Nick Lewycky authored Jan 21, 2011
```
instructions.

llvm-svn: 123973
```
  ae0275e0
Jan 18, 2011
- fix rdar://8878965, a regression I introduced with the recent · 86d56c65
  Chris Lattner authored Jan 18, 2011
```
llvm.objectsize changes.

llvm-svn: 123771
```
  86d56c65
- Remove code for updating dominance frontiers and some outdated references to · b703654e
  Cameron Zwarich authored Jan 18, 2011
```
dominance and post-dominance frontiers.

llvm-svn: 123725
```
  b703654e
- Remove outdated references to dominance frontiers. · 4694e695
  Cameron Zwarich authored Jan 18, 2011
```
llvm-svn: 123724
```
  4694e695
Jan 17, 2011

Remove dead code, that I apparently wrote a while back. We seem to be doing well enough · 459e0799

Owen Anderson authored Jan 17, 2011

without whatever this was trying to do.  When/if someone has the time to do some empirical
evaluations, it might be worth it to figure out what this code was trying to do and see if
it's worth resurrecting/fixing.

llvm-svn: 123684

459e0799

Roll r123609 back in with two changes that fix test failures with expensive · b410858a

Cameron Zwarich authored Jan 17, 2011

checks enabled:

1) Use '<' to compare integers in a comparison function rather than '<='.

2) Use the uniqued set DefBlocks rather than Info.DefiningBlocks to initialize
the priority queue.

The speedup of scalarrepl on test-suite + SPEC2000 + SPEC2006 is a bit less, at
just under 16% rather than 17%.

llvm-svn: 123662

b410858a

Roll out r123609 due to failures on the llvm-x86_64-linux-checks bot. · 67431d79
Cameron Zwarich authored Jan 17, 2011
```
llvm-svn: 123618
```
67431d79

Eliminate the use of dominance frontiers in PromoteMemToReg. In addition to · 814cd923

Cameron Zwarich authored Jan 17, 2011

eliminating a potentially quadratic data structure, this also gives a 17%
speedup when running -scalarrepl on test-suite + SPEC2000 + SPEC2006. My initial
experiment gave a greater speedup around 25%, but I moved the dominator tree
level computation from dominator tree construction to PromoteMemToReg.

Since this approach to computing IDFs has a much lower overhead than the old
code using precomputed DFs, it is worth looking at using this new code for the
second scalarrepl pass as well.

llvm-svn: 123609

814cd923

Jan 16, 2011

tidy up a comment, as suggested by duncan · 7c9f4c9c
Chris Lattner authored Jan 16, 2011
```
llvm-svn: 123590
```
7c9f4c9c
simplify a little · ed1fb92c
Chris Lattner authored Jan 16, 2011
```
llvm-svn: 123573
```
ed1fb92c

if an alloca is only ever accessed as a unit, and is accessed with load/store instructions, · 6fab2e94

Chris Lattner authored Jan 16, 2011

then don't try to decimate it into its individual pieces.  This will just make a mess of the
IR and is pointless if none of the elements are individually accessed.  This was generating
really terrible code for std::bitset (PR8980) because it happens to be lowered by clang
as an {[8 x i8]} structure instead of {i64}.

The testcase now is optimized to:

define i64 @test2(i64 %X) {
  br label %L2

L2:                                               ; preds = %0
  ret i64 %X
}

before we generated:

define i64 @test2(i64 %X) {
  %sroa.store.elt = lshr i64 %X, 56
  %1 = trunc i64 %sroa.store.elt to i8
  %sroa.store.elt8 = lshr i64 %X, 48
  %2 = trunc i64 %sroa.store.elt8 to i8
  %sroa.store.elt9 = lshr i64 %X, 40
  %3 = trunc i64 %sroa.store.elt9 to i8
  %sroa.store.elt10 = lshr i64 %X, 32
  %4 = trunc i64 %sroa.store.elt10 to i8
  %sroa.store.elt11 = lshr i64 %X, 24
  %5 = trunc i64 %sroa.store.elt11 to i8
  %sroa.store.elt12 = lshr i64 %X, 16
  %6 = trunc i64 %sroa.store.elt12 to i8
  %sroa.store.elt13 = lshr i64 %X, 8
  %7 = trunc i64 %sroa.store.elt13 to i8
  %8 = trunc i64 %X to i8
  br label %L2

L2:                                               ; preds = %0
  %9 = zext i8 %1 to i64
  %10 = shl i64 %9, 56
  %11 = zext i8 %2 to i64
  %12 = shl i64 %11, 48
  %13 = or i64 %12, %10
  %14 = zext i8 %3 to i64
  %15 = shl i64 %14, 40
  %16 = or i64 %15, %13
  %17 = zext i8 %4 to i64
  %18 = shl i64 %17, 32
  %19 = or i64 %18, %16
  %20 = zext i8 %5 to i64
  %21 = shl i64 %20, 24
  %22 = or i64 %21, %19
  %23 = zext i8 %6 to i64
  %24 = shl i64 %23, 16
  %25 = or i64 %24, %22
  %26 = zext i8 %7 to i64
  %27 = shl i64 %26, 8
  %28 = or i64 %27, %25
  %29 = zext i8 %8 to i64
  %30 = or i64 %29, %28
  ret i64 %30
}

In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough
PHIs are in play that instcombine backs off.  It's better to not generate this stuff
in the first place.

llvm-svn: 123571

6fab2e94

Use an irbuilder to get some trivial constant folding when doing a store · 7cd8cf7d
Chris Lattner authored Jan 16, 2011
```
of a constant.

llvm-svn: 123570
```
7cd8cf7d

enhance FoldOpIntoPhi in instcombine to try harder when a phi has · d55581de

Chris Lattner authored Jan 16, 2011

multiple uses.  In some cases, all the uses are the same operation,
so instcombine can go ahead and promote the phi.  In the testcase
this pushes an add out of the loop.

llvm-svn: 123568

d55581de

Jan 15, 2011

temporarily revert r123526. While working on a follow-on patch I · af263907
Chris Lattner authored Jan 15, 2011
```
realize that ConstantFoldTerminator doesn't preserve dominfo.

llvm-svn: 123527
```
af263907

fix rdar://8785296 - -fcatch-undefined-behavior generates inefficient code · 8df83c4a

Chris Lattner authored Jan 15, 2011

The basic issue is that isel (very reasonably!) expects conditional branches
to be folded, so CGP leaving around a bunch dead computation feeding
conditional branches isn't such a good idea.  Just fold branches on constants
into unconditional branches.

llvm-svn: 123526

8df83c4a

simplify code, no functionality change. · ee588def
Chris Lattner authored Jan 15, 2011
```
llvm-svn: 123525
```
ee588def

Now that instruction optzns can update the iterator as they go, we can · 1b93be50

Chris Lattner authored Jan 15, 2011

have objectsize folding recursively simplify away their result when it
folds.  It is important to catch this here, because otherwise we won't
eliminate the cross-block values at isel and other times.

llvm-svn: 123524

1b93be50

make the current instruction iterator an ivar, allowing xforms that · 7a277144

Chris Lattner authored Jan 15, 2011

potentially invalidate it (like inline asm lowering) to be sunk into
their proper place, cleaning up a ton of code.

llvm-svn: 123523

7a277144

Generalize LoadAndStorePromoter a bit and switch LICM · b68ec5c3
Chris Lattner authored Jan 15, 2011
```
to use it.

llvm-svn: 123501
```
b68ec5c3

Jan 14, 2011

switch SRoA to use LoadAndStorePromoter instead of its own copy of the code. · b498f9af
Chris Lattner authored Jan 14, 2011
```
llvm-svn: 123457
```
b498f9af
split SROA into two passes: one that uses DomFrontiers (-scalarrepl) · 9987a6f4
Chris Lattner authored Jan 14, 2011
```
and one that uses SSAUpdater (-scalarrepl-ssa)

llvm-svn: 123436
```
9987a6f4

Implement full support for promoting allocas to registers using SSAUpdater · 543384ef

Chris Lattner authored Jan 14, 2011

instead of DomTree/DomFrontier.  This may be interesting for reducing compile 
time.  This is currently disabled, but seems to work just fine.

When this is enabled, we eliminate two runs of dominator frontier, one in the
"early per-function" optimizations and one in the "interlaced with inliner"
function passes.

llvm-svn: 123434

543384ef

Jan 13, 2011

Fix whitespace. · 328e91bb
Bob Wilson authored Jan 13, 2011
```
llvm-svn: 123396
```
328e91bb
Check for empty structs, and for consistency, zero-element arrays. · c8056a95
Bob Wilson authored Jan 13, 2011
```
llvm-svn: 123383
```
c8056a95

Extend SROA to handle arrays accessed as homogeneous structs and vice versa. · 08713d3c

Bob Wilson authored Jan 13, 2011

This is a minor extension of SROA to handle a special case that is
important for some ARM NEON operations. Some of the NEON intrinsics
return multiple values, which are handled as struct types containing
multiple elements of the same vector type. The corresponding return
types declared in the arm_neon.h header have equivalent arrays. We
need SROA to recognize that it can split up those arrays and structs
into separate vectors, even though they are not always accessed with
the same type. SROA already handles loads and stores of an entire
alloca by using insertvalue/extractvalue to access the individual
pieces, and that code works the same regardless of whether the type
is a struct or an array. So, all that needs to be done is to check
for compatible arrays and homogeneous structs.

llvm-svn: 123381

08713d3c

Make SROA more aggressive with allocas containing padding. · 12eec40c

Bob Wilson authored Jan 13, 2011

SROA only split up structs and arrays one level at a time, so padding can
only cause trouble if it is located in between the struct or array elements.

llvm-svn: 123380

12eec40c