Commits · 76dfa226cf7a612c3da269968b93c8d4dffd19f5 · Roger Ferrer / llvm-epi-0.8

Apr 20, 2011
- The bitcast case here is actually handled uniformly earlier in the function, so · 76dfa226
  Cameron Zwarich authored Apr 20, 2011
```
delete it.

llvm-svn: 129877
```
  76dfa226
- Cleanup some code to better use an early return style in preparation for adding · 4cd9a4a9
  Cameron Zwarich authored Apr 20, 2011
```
more cases.

llvm-svn: 129876
```
  4cd9a4a9
Apr 14, 2011
- Cleanup r129509 based on comments by Chris · 1cde9167
  Mon P Wang authored Apr 14, 2011
```
llvm-svn: 129532
```
  1cde9167
- Cleanup r129472 by using a utility routine as suggested by Eli. · 0f6bad7b
  Mon P Wang authored Apr 14, 2011
```
llvm-svn: 129509
```
  0f6bad7b
Apr 13, 2011

Vectors with different number of elements of the same element type can have · 2e5528f0

Mon P Wang authored Apr 13, 2011

the same allocation size but different primitive sizes(e.g., <3xi32> and
<4xi32>).  When ScalarRepl promotes them, it can't use a bit cast but
should use a shuffle vector instead.

llvm-svn: 129472

2e5528f0

Mar 30, 2011
- Remove PHINode::reserveOperandSpace(). Instead, add a parameter to · 52131344
  Jay Foad authored Mar 30, 2011
```
PHINode::Create() giving the (known or expected) number of operands.

llvm-svn: 128537
```
  52131344
- (Almost) always call reserveOperandSpace() on newly created PHINodes. · e0938d8a
  Jay Foad authored Mar 30, 2011
```
llvm-svn: 128535
```
  e0938d8a
Mar 29, 2011

Do some simple copy propagation through integer loads and stores when promoting · ff811cc4

Cameron Zwarich authored Mar 29, 2011

vector types. This helps a lot with inlined functions when using the ARM soft
float ABI. Fixes <rdar://problem/9184212>.

llvm-svn: 128453

ff811cc4

Mar 26, 2011
- Fix a typo and add a test. · d4174ee4
  Cameron Zwarich authored Mar 26, 2011
```
llvm-svn: 128331
```
  d4174ee4
Mar 23, 2011
- Fix PR9464 by correcting some math that just happened to be right in most cases · 10ebc189
  Cameron Zwarich authored Mar 23, 2011
```
that were hit in practice.

llvm-svn: 128146
```
  10ebc189
Mar 16, 2011

Fix a comment. · 7599b106
Cameron Zwarich authored Mar 16, 2011
```
llvm-svn: 127728
```
7599b106

Only convert allocas to scalars if it is profitable. The profitability metric I · 0454253d

Cameron Zwarich authored Mar 16, 2011

chose is having a non-memcpy/memset use and being larger than any native integer
type. Originally I chose having an access of a size smaller than the total size
of the alloca, but this caused some minor issues on the spirit benchmark where
SRoA runs again after some inlining.

This fixes <rdar://problem/8613163>.

llvm-svn: 127718

0454253d

Better use initializer lists. · b51c830f
Cameron Zwarich authored Mar 16, 2011
```
llvm-svn: 127716
```
b51c830f
Add a clarifying comment. · 63062ccf
Cameron Zwarich authored Mar 16, 2011
```
llvm-svn: 127715
```
63062ccf

Mar 09, 2011

Fix a crasher introduced by r127317 that is seen on the bots when using an · 19f2b3c6

Cameron Zwarich authored Mar 09, 2011

alloca as both integer and floating-point vectors of the same size. Bugpoint is
not cooperating with me, but I'll try to find a manual testcase tomorrow.

llvm-svn: 127320

19f2b3c6

Add support to scalar replacement for partial vector accesses of an alloca, e.g. · 3b649f4d

Cameron Zwarich authored Mar 09, 2011

a union of a float, <2 x float>, and <4 x float>. This mostly comes up with the
use of vector intrinsics, especially in NEON when programmers know the layout of
the register file. This enables codegen to eliminate a lot of the subregister
traffic it would otherwise generate.

This commit only enables this for a small number of floating-point cases, but a
lot more integer cases. I assume this is okay for all ports, but I did not do
extensive testing of the quality of code involving i512 vectors and the like. If
there is a use case where this generates worse code than before, let me know and
we can scale it back.

This fixes <rdar://problem/9036264>.

llvm-svn: 127317

3b649f4d

Move vector type merging to a separate function in preparation for it getting · 43a241fa
Cameron Zwarich authored Mar 09, 2011
```
more complicated.

llvm-svn: 127316
```
43a241fa

Feb 15, 2011
- convert ConstantVector::get to use ArrayRef. · 69229316
  Chris Lattner authored Feb 15, 2011
```
llvm-svn: 125537
```
  69229316
Feb 14, 2011
- revert my ConstantVector patch, it seems to have made the llvm-gcc · 34442e6e
  Chris Lattner authored Feb 14, 2011
```
builders unhappy.

llvm-svn: 125504
```
  34442e6e
- Switch ConstantVector::get to use ArrayRef instead of a pointer+size · d9f5b885
  Chris Lattner authored Feb 14, 2011
```
idiom.  Change various clients to simplify their code.

llvm-svn: 125487
```
  d9f5b885
Jan 24, 2011

Give GetUnderlyingObject a TargetData, to keep it in sync · 0f124e19

Dan Gohman authored Jan 24, 2011

with BasicAA's DecomposeGEPExpression, which recently began
using a TargetData. This fixes PR8968, though the testcase
is awkward to reduce.

Also, update several off GetUnderlyingObject's users
which happen to have a TargetData handy to pass it in.

llvm-svn: 124134

0f124e19

enhance SRoA to promote allocas that are used by PHI nodes. This often · d83e7b0f

Chris Lattner authored Jan 24, 2011

occurs because instcombine sinks loads and inserts phis.  This kicks in 
on such apps as 175.vpr, eon, 403.gcc, xalancbmk and a bunch of times in
spec2006 in some app that uses std::deque.

This resolves the last of rdar://7339113.

llvm-svn: 124090

d83e7b0f

Jan 23, 2011

Enhance SRoA to promote allocas that are used by selects in some · a960725d

Chris Lattner authored Jan 23, 2011

common cases.  This triggers a surprising number of times in SPEC2K6
because min/max idioms end up doing this.  For example, code from the
STL ends up looking like this to SRoA:

  %202 = load i64* %__old_size, align 8, !tbaa !3
  %203 = load i64* %__old_size, align 8, !tbaa !3
  %204 = load i64* %__n, align 8, !tbaa !3
  %205 = icmp ult i64 %203, %204
  %storemerge.i = select i1 %205, i64* %__n, i64* %__old_size
  %206 = load i64* %storemerge.i, align 8, !tbaa !3

We can now promote both the __n and the __old_size allocas.

This addresses another chunk of rdar://7339113, poor codegen on
stringswitch.

llvm-svn: 124088

a960725d

Enhance SRoA to be more aggressive about scalarization of aggregate allocas · 9491dee2

Chris Lattner authored Jan 23, 2011

that have PHI or select uses of their element pointers.  This can often happen
when instcombine sinks two loads into a successor, inserting a phi or select.

With this patch, we can scalarize the alloca, but the pinned elements are not
yet promoted.  This is still a win for large aggregates where only one element
is used.  This fixes rdar://8904039 and part of rdar://7339113 (poor codegen
on stringswitch).

llvm-svn: 124070

9491dee2

have AllocaInfo store the alloca being inspected, simplifying callers. · 8acbb795
Chris Lattner authored Jan 23, 2011
```
No functionality change.

llvm-svn: 124067
```
8acbb795

Rearrange some code a bit. Change MarkUnsafe to · 3e56c290

Chris Lattner authored Jan 23, 2011

handle the "Transformation preventing inst" printing, 
so that -scalarrepl -debug will always print the rejected
instruction.  No functionality change.

llvm-svn: 124066

3e56c290

remove an old hack that avoided creating MMX datatypes. The · a587ab7b
Chris Lattner authored Jan 23, 2011
```
X86 backend has been fixed.

llvm-svn: 124064
```
a587ab7b

Jan 18, 2011
- Remove outdated references to dominance frontiers. · 4694e695
  Cameron Zwarich authored Jan 18, 2011
```
llvm-svn: 123724
```
  4694e695
Jan 17, 2011

Roll r123609 back in with two changes that fix test failures with expensive · b410858a

Cameron Zwarich authored Jan 17, 2011

checks enabled:

1) Use '<' to compare integers in a comparison function rather than '<='.

2) Use the uniqued set DefBlocks rather than Info.DefiningBlocks to initialize
the priority queue.

The speedup of scalarrepl on test-suite + SPEC2000 + SPEC2006 is a bit less, at
just under 16% rather than 17%.

llvm-svn: 123662

b410858a

Roll out r123609 due to failures on the llvm-x86_64-linux-checks bot. · 67431d79
Cameron Zwarich authored Jan 17, 2011
```
llvm-svn: 123618
```
67431d79

Eliminate the use of dominance frontiers in PromoteMemToReg. In addition to · 814cd923

Cameron Zwarich authored Jan 17, 2011

eliminating a potentially quadratic data structure, this also gives a 17%
speedup when running -scalarrepl on test-suite + SPEC2000 + SPEC2006. My initial
experiment gave a greater speedup around 25%, but I moved the dominator tree
level computation from dominator tree construction to PromoteMemToReg.

Since this approach to computing IDFs has a much lower overhead than the old
code using precomputed DFs, it is worth looking at using this new code for the
second scalarrepl pass as well.

llvm-svn: 123609

814cd923

Jan 16, 2011

tidy up a comment, as suggested by duncan · 7c9f4c9c
Chris Lattner authored Jan 16, 2011
```
llvm-svn: 123590
```
7c9f4c9c

if an alloca is only ever accessed as a unit, and is accessed with load/store instructions, · 6fab2e94

Chris Lattner authored Jan 16, 2011

then don't try to decimate it into its individual pieces.  This will just make a mess of the
IR and is pointless if none of the elements are individually accessed.  This was generating
really terrible code for std::bitset (PR8980) because it happens to be lowered by clang
as an {[8 x i8]} structure instead of {i64}.

The testcase now is optimized to:

define i64 @test2(i64 %X) {
  br label %L2

L2:                                               ; preds = %0
  ret i64 %X
}

before we generated:

define i64 @test2(i64 %X) {
  %sroa.store.elt = lshr i64 %X, 56
  %1 = trunc i64 %sroa.store.elt to i8
  %sroa.store.elt8 = lshr i64 %X, 48
  %2 = trunc i64 %sroa.store.elt8 to i8
  %sroa.store.elt9 = lshr i64 %X, 40
  %3 = trunc i64 %sroa.store.elt9 to i8
  %sroa.store.elt10 = lshr i64 %X, 32
  %4 = trunc i64 %sroa.store.elt10 to i8
  %sroa.store.elt11 = lshr i64 %X, 24
  %5 = trunc i64 %sroa.store.elt11 to i8
  %sroa.store.elt12 = lshr i64 %X, 16
  %6 = trunc i64 %sroa.store.elt12 to i8
  %sroa.store.elt13 = lshr i64 %X, 8
  %7 = trunc i64 %sroa.store.elt13 to i8
  %8 = trunc i64 %X to i8
  br label %L2

L2:                                               ; preds = %0
  %9 = zext i8 %1 to i64
  %10 = shl i64 %9, 56
  %11 = zext i8 %2 to i64
  %12 = shl i64 %11, 48
  %13 = or i64 %12, %10
  %14 = zext i8 %3 to i64
  %15 = shl i64 %14, 40
  %16 = or i64 %15, %13
  %17 = zext i8 %4 to i64
  %18 = shl i64 %17, 32
  %19 = or i64 %18, %16
  %20 = zext i8 %5 to i64
  %21 = shl i64 %20, 24
  %22 = or i64 %21, %19
  %23 = zext i8 %6 to i64
  %24 = shl i64 %23, 16
  %25 = or i64 %24, %22
  %26 = zext i8 %7 to i64
  %27 = shl i64 %26, 8
  %28 = or i64 %27, %25
  %29 = zext i8 %8 to i64
  %30 = or i64 %29, %28
  ret i64 %30
}

In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough
PHIs are in play that instcombine backs off.  It's better to not generate this stuff
in the first place.

llvm-svn: 123571

6fab2e94

Use an irbuilder to get some trivial constant folding when doing a store · 7cd8cf7d
Chris Lattner authored Jan 16, 2011
```
of a constant.

llvm-svn: 123570
```
7cd8cf7d

enhance FoldOpIntoPhi in instcombine to try harder when a phi has · d55581de

Chris Lattner authored Jan 16, 2011

multiple uses.  In some cases, all the uses are the same operation,
so instcombine can go ahead and promote the phi.  In the testcase
this pushes an add out of the loop.

llvm-svn: 123568

d55581de

Jan 15, 2011
- Generalize LoadAndStorePromoter a bit and switch LICM · b68ec5c3
  Chris Lattner authored Jan 15, 2011
```
to use it.

llvm-svn: 123501
```
  b68ec5c3
Jan 14, 2011

switch SRoA to use LoadAndStorePromoter instead of its own copy of the code. · b498f9af
Chris Lattner authored Jan 14, 2011
```
llvm-svn: 123457
```
b498f9af
split SROA into two passes: one that uses DomFrontiers (-scalarrepl) · 9987a6f4
Chris Lattner authored Jan 14, 2011
```
and one that uses SSAUpdater (-scalarrepl-ssa)

llvm-svn: 123436
```
9987a6f4

Implement full support for promoting allocas to registers using SSAUpdater · 543384ef

Chris Lattner authored Jan 14, 2011

instead of DomTree/DomFrontier.  This may be interesting for reducing compile 
time.  This is currently disabled, but seems to work just fine.

When this is enabled, we eliminate two runs of dominator frontier, one in the
"early per-function" optimizations and one in the "interlaced with inliner"
function passes.

llvm-svn: 123434

543384ef

Jan 13, 2011
- Fix whitespace. · 328e91bb
  Bob Wilson authored Jan 13, 2011
```
llvm-svn: 123396
```
  328e91bb