Commits · d83e7b0ff6bf27b18656ce343a41cf85520da955 · Roger Ferrer / llvm-epi-0.8

Jan 24, 2011

enhance SRoA to promote allocas that are used by PHI nodes. This often · d83e7b0f

Chris Lattner authored Jan 24, 2011

occurs because instcombine sinks loads and inserts phis.  This kicks in 
on such apps as 175.vpr, eon, 403.gcc, xalancbmk and a bunch of times in
spec2006 in some app that uses std::deque.

This resolves the last of rdar://7339113.

llvm-svn: 124090

d83e7b0f

Jan 23, 2011

Enhance SRoA to promote allocas that are used by selects in some · a960725d

Chris Lattner authored Jan 23, 2011

common cases.  This triggers a surprising number of times in SPEC2K6
because min/max idioms end up doing this.  For example, code from the
STL ends up looking like this to SRoA:

  %202 = load i64* %__old_size, align 8, !tbaa !3
  %203 = load i64* %__old_size, align 8, !tbaa !3
  %204 = load i64* %__n, align 8, !tbaa !3
  %205 = icmp ult i64 %203, %204
  %storemerge.i = select i1 %205, i64* %__n, i64* %__old_size
  %206 = load i64* %storemerge.i, align 8, !tbaa !3

We can now promote both the __n and the __old_size allocas.

This addresses another chunk of rdar://7339113, poor codegen on
stringswitch.

llvm-svn: 124088

a960725d

Enhance SRoA to be more aggressive about scalarization of aggregate allocas · 9491dee2

Chris Lattner authored Jan 23, 2011

that have PHI or select uses of their element pointers.  This can often happen
when instcombine sinks two loads into a successor, inserting a phi or select.

With this patch, we can scalarize the alloca, but the pinned elements are not
yet promoted.  This is still a win for large aggregates where only one element
is used.  This fixes rdar://8904039 and part of rdar://7339113 (poor codegen
on stringswitch).

llvm-svn: 124070

9491dee2

have AllocaInfo store the alloca being inspected, simplifying callers. · 8acbb795
Chris Lattner authored Jan 23, 2011
```
No functionality change.

llvm-svn: 124067
```
8acbb795

Rearrange some code a bit. Change MarkUnsafe to · 3e56c290

Chris Lattner authored Jan 23, 2011

handle the "Transformation preventing inst" printing, 
so that -scalarrepl -debug will always print the rejected
instruction.  No functionality change.

llvm-svn: 124066

3e56c290

remove an old hack that avoided creating MMX datatypes. The · a587ab7b
Chris Lattner authored Jan 23, 2011
```
X86 backend has been fixed.

llvm-svn: 124064
```
a587ab7b

Jan 18, 2011
- Remove outdated references to dominance frontiers. · 4694e695
  Cameron Zwarich authored Jan 18, 2011
  
  llvm-svn: 123724
  4694e695
Jan 17, 2011

Roll r123609 back in with two changes that fix test failures with expensive · b410858a

Cameron Zwarich authored Jan 17, 2011

checks enabled:

1) Use '<' to compare integers in a comparison function rather than '<='.

2) Use the uniqued set DefBlocks rather than Info.DefiningBlocks to initialize
the priority queue.

The speedup of scalarrepl on test-suite + SPEC2000 + SPEC2006 is a bit less, at
just under 16% rather than 17%.

llvm-svn: 123662

b410858a

Roll out r123609 due to failures on the llvm-x86_64-linux-checks bot. · 67431d79
Cameron Zwarich authored Jan 17, 2011
```
llvm-svn: 123618
```
67431d79

Eliminate the use of dominance frontiers in PromoteMemToReg. In addition to · 814cd923

Cameron Zwarich authored Jan 17, 2011

eliminating a potentially quadratic data structure, this also gives a 17%
speedup when running -scalarrepl on test-suite + SPEC2000 + SPEC2006. My initial
experiment gave a greater speedup around 25%, but I moved the dominator tree
level computation from dominator tree construction to PromoteMemToReg.

Since this approach to computing IDFs has a much lower overhead than the old
code using precomputed DFs, it is worth looking at using this new code for the
second scalarrepl pass as well.

llvm-svn: 123609

814cd923

Jan 16, 2011

tidy up a comment, as suggested by duncan · 7c9f4c9c
Chris Lattner authored Jan 16, 2011
```
llvm-svn: 123590
```
7c9f4c9c

if an alloca is only ever accessed as a unit, and is accessed with load/store instructions, · 6fab2e94

Chris Lattner authored Jan 16, 2011

then don't try to decimate it into its individual pieces.  This will just make a mess of the
IR and is pointless if none of the elements are individually accessed.  This was generating
really terrible code for std::bitset (PR8980) because it happens to be lowered by clang
as an {[8 x i8]} structure instead of {i64}.

The testcase now is optimized to:

define i64 @test2(i64 %X) {
  br label %L2

L2:                                               ; preds = %0
  ret i64 %X
}

before we generated:

define i64 @test2(i64 %X) {
  %sroa.store.elt = lshr i64 %X, 56
  %1 = trunc i64 %sroa.store.elt to i8
  %sroa.store.elt8 = lshr i64 %X, 48
  %2 = trunc i64 %sroa.store.elt8 to i8
  %sroa.store.elt9 = lshr i64 %X, 40
  %3 = trunc i64 %sroa.store.elt9 to i8
  %sroa.store.elt10 = lshr i64 %X, 32
  %4 = trunc i64 %sroa.store.elt10 to i8
  %sroa.store.elt11 = lshr i64 %X, 24
  %5 = trunc i64 %sroa.store.elt11 to i8
  %sroa.store.elt12 = lshr i64 %X, 16
  %6 = trunc i64 %sroa.store.elt12 to i8
  %sroa.store.elt13 = lshr i64 %X, 8
  %7 = trunc i64 %sroa.store.elt13 to i8
  %8 = trunc i64 %X to i8
  br label %L2

L2:                                               ; preds = %0
  %9 = zext i8 %1 to i64
  %10 = shl i64 %9, 56
  %11 = zext i8 %2 to i64
  %12 = shl i64 %11, 48
  %13 = or i64 %12, %10
  %14 = zext i8 %3 to i64
  %15 = shl i64 %14, 40
  %16 = or i64 %15, %13
  %17 = zext i8 %4 to i64
  %18 = shl i64 %17, 32
  %19 = or i64 %18, %16
  %20 = zext i8 %5 to i64
  %21 = shl i64 %20, 24
  %22 = or i64 %21, %19
  %23 = zext i8 %6 to i64
  %24 = shl i64 %23, 16
  %25 = or i64 %24, %22
  %26 = zext i8 %7 to i64
  %27 = shl i64 %26, 8
  %28 = or i64 %27, %25
  %29 = zext i8 %8 to i64
  %30 = or i64 %29, %28
  ret i64 %30
}

In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough
PHIs are in play that instcombine backs off.  It's better to not generate this stuff
in the first place.

llvm-svn: 123571

6fab2e94

Use an irbuilder to get some trivial constant folding when doing a store · 7cd8cf7d
Chris Lattner authored Jan 16, 2011
```
of a constant.

llvm-svn: 123570
```
7cd8cf7d

enhance FoldOpIntoPhi in instcombine to try harder when a phi has · d55581de

Chris Lattner authored Jan 16, 2011

multiple uses.  In some cases, all the uses are the same operation,
so instcombine can go ahead and promote the phi.  In the testcase
this pushes an add out of the loop.

llvm-svn: 123568

d55581de

Jan 15, 2011
- Generalize LoadAndStorePromoter a bit and switch LICM · b68ec5c3
  Chris Lattner authored Jan 15, 2011
  
  to use it. llvm-svn: 123501
  b68ec5c3
Jan 14, 2011

switch SRoA to use LoadAndStorePromoter instead of its own copy of the code. · b498f9af
Chris Lattner authored Jan 14, 2011
```
llvm-svn: 123457
```
b498f9af
split SROA into two passes: one that uses DomFrontiers (-scalarrepl) · 9987a6f4
Chris Lattner authored Jan 14, 2011
```
and one that uses SSAUpdater (-scalarrepl-ssa)

llvm-svn: 123436
```
9987a6f4

Implement full support for promoting allocas to registers using SSAUpdater · 543384ef

Chris Lattner authored Jan 14, 2011

instead of DomTree/DomFrontier.  This may be interesting for reducing compile 
time.  This is currently disabled, but seems to work just fine.

When this is enabled, we eliminate two runs of dominator frontier, one in the
"early per-function" optimizations and one in the "interlaced with inliner"
function passes.

llvm-svn: 123434

543384ef

Jan 13, 2011

Fix whitespace. · 328e91bb
Bob Wilson authored Jan 13, 2011
```
llvm-svn: 123396
```
328e91bb
Check for empty structs, and for consistency, zero-element arrays. · c8056a95
Bob Wilson authored Jan 13, 2011
```
llvm-svn: 123383
```
c8056a95

Extend SROA to handle arrays accessed as homogeneous structs and vice versa. · 08713d3c

Bob Wilson authored Jan 13, 2011

This is a minor extension of SROA to handle a special case that is
important for some ARM NEON operations. Some of the NEON intrinsics
return multiple values, which are handled as struct types containing
multiple elements of the same vector type. The corresponding return
types declared in the arm_neon.h header have equivalent arrays. We
need SROA to recognize that it can split up those arrays and structs
into separate vectors, even though they are not always accessed with
the same type. SROA already handles loads and stores of an entire
alloca by using insertvalue/extractvalue to access the individual
pieces, and that code works the same regardless of whether the type
is a struct or an array. So, all that needs to be done is to check
for compatible arrays and homogeneous structs.

llvm-svn: 123381

08713d3c

Make SROA more aggressive with allocas containing padding. · 12eec40c

Bob Wilson authored Jan 13, 2011

SROA only split up structs and arrays one level at a time, so padding can
only cause trouble if it is located in between the struct or array elements.

llvm-svn: 123380

12eec40c

Jan 02, 2011
- split dom frontier handling stuff out to its own DominanceFrontier header, · bf0aa927
  Chris Lattner authored Jan 02, 2011
  
  so that Dominators.h is *just* domtree. Also prune #includes a bit. llvm-svn: 122714
  bf0aa927
Dec 26, 2010
- start using irbuilder to make mem intrinsics in a few passes. · 6cf8d6cc
  Chris Lattner authored Dec 26, 2010
  
  llvm-svn: 122572
  6cf8d6cc
Dec 23, 2010
- Preserve the address space when generating bitcasts for MemTransferInst in ConvertToScalarInfo · 18b762a9
  Mon P Wang authored Dec 23, 2010
  
  llvm-svn: 122462
  18b762a9
Dec 15, 2010
- Move Value::getUnderlyingObject to be a standalone · a4fcd241
  Dan Gohman authored Dec 15, 2010
  
  function so that it can live in Analysis instead of VMCore. llvm-svn: 121885
  a4fcd241
Nov 24, 2010
- Treat a call of function pointer like a load of the pointer when considering · b8de00ee
  Nick Lewycky authored Nov 24, 2010
  
  whether the pointer can be replaced with the global variable it is a copy of. Fixes PR8680. llvm-svn: 120126
  b8de00ee
Nov 20, 2010
- Simplify code. No change in functionality. · ddd1b7b8
  Benjamin Kramer authored Nov 20, 2010
  
  llvm-svn: 119908
  ddd1b7b8
Nov 18, 2010

finish a thought. · 1e37bbaf
Chris Lattner authored Nov 18, 2010
```
llvm-svn: 119690
```
1e37bbaf

allow eliminating an alloca that is just copied from an constant global · ac570131

Chris Lattner authored Nov 18, 2010

if it is passed as a byval argument.  The byval argument will just be a
read, so it is safe to read from the original global instead.  This allows
us to promote away the %agg.tmp alloca in PR8582

llvm-svn: 119686

ac570131

enhance the "alloca is just a memcpy from constant global" · f183d5c4
Chris Lattner authored Nov 18, 2010
```
to ignore calls that obviously can't modify the alloca
because they are readonly/readnone.

llvm-svn: 119683
```
f183d5c4

fix a small oversight in the "eliminate memcpy from constant global" · 7aeae25c

Chris Lattner authored Nov 18, 2010

optimization.  If the alloca that is "memcpy'd from constant" also has
a memcpy from *it*, ignore it: it is a load.  We now optimize the testcase to:

define void @test2() {
  %B = alloca %T
  %a = bitcast %T* @G to i8*
  %b = bitcast %T* %B to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %b, i8* %a, i64 124, i32 4, i1 false)
  call void @bar(i8* %b)
  ret void
}

previously we would generate:

define void @test() {
  %B = alloca %T
  %b = bitcast %T* %B to i8*
  %G.0 = getelementptr inbounds %T* @G, i32 0, i32 0
  %tmp3 = load i8* %G.0, align 4
  %G.1 = getelementptr inbounds %T* @G, i32 0, i32 1
  %G.15 = bitcast [123 x i8]* %G.1 to i8*
  %1 = bitcast [123 x i8]* %G.1 to i984*
  %srcval = load i984* %1, align 1
  %B.0 = getelementptr inbounds %T* %B, i32 0, i32 0
  store i8 %tmp3, i8* %B.0, align 4
  %B.1 = getelementptr inbounds %T* %B, i32 0, i32 1
  %B.12 = bitcast [123 x i8]* %B.1 to i8*
  %2 = bitcast [123 x i8]* %B.1 to i984*
  store i984 %srcval, i984* %2, align 1
  call void @bar(i8* %b)
  ret void
}

llvm-svn: 119682

7aeae25c

Oct 19, 2010

Get rid of static constructors for pass registration. Instead, every pass... · 6c18d1aa

Owen Anderson authored Oct 19, 2010

Get rid of static constructors for pass registration. Instead, every pass exposes an initializeMyPassFunction(), which
must be called in the pass's constructor. This function uses static dependency declarations to recursively initialize
the pass's dependencies.

Clients that only create passes through the createFooPass() APIs will require no changes. Clients that want to use the
CommandLine options for passes will need to manually call the appropriate initialization functions in PassInitialization.h
before parsing commandline arguments.

I have tested this with all standard configurations of clang and llvm-gcc on Darwin. It is possible that there are problems
with the static dependencies that will only be visible with non-standard options. If you encounter any crash in pass
registration/creation, please send the testcase to me directly.

llvm-svn: 116820

6c18d1aa

Oct 16, 2010
- Eliminate some calls to Value::getNameStr. · 1dc34b48
  Benjamin Kramer authored Oct 16, 2010
  
  llvm-svn: 116670
  1dc34b48
Oct 12, 2010

Begin adding static dependence information to passes, which will allow us to · 8ac477ff

Owen Anderson authored Oct 12, 2010

perform initialization without static constructors AND without explicit initialization
by the client.  For the moment, passes are required to initialize both their
(potential) dependencies and any passes they preserve.  I hope to be able to relax
the latter requirement in the future.

llvm-svn: 116334

8ac477ff

Oct 08, 2010
- Now with fewer extraneous semicolons! · df7a4f25
  Owen Anderson authored Oct 07, 2010
  
  llvm-svn: 115996
  df7a4f25
Oct 01, 2010

Massive rewrite of MMX: · dd224d23

Dale Johannesen authored Sep 30, 2010

The x86_mmx type is used for MMX intrinsics, parameters and
return values where these use MMX registers, and is also
supported in load, store, and bitcast.

Only the above operations generate MMX instructions, and optimizations
do not operate on or produce MMX intrinsics. 

MMX-sized vectors <2 x i32> etc. are lowered to XMM or split into
smaller pieces.  Optimizations may occur on these forms and the
result casted back to x86_mmx, provided the result feeds into a
previous existing x86_mmx operation.

The point of all this is prevent optimizations from introducing
MMX operations, which is unsafe due to the EMMS problem.

llvm-svn: 115243

dd224d23

Sep 02, 2010
- deepen my MMX/SRoA hack to avoid hurting non-x86 codegen. · 8af45a88
  Chris Lattner authored Sep 01, 2010
  
  llvm-svn: 112763
  8af45a88
Sep 01, 2010

add a gross hack to work around a problem that Argiris reported · 34e5361e

Chris Lattner authored Sep 01, 2010

on llvmdev: SRoA is introducing MMX datatypes like <1 x i64>,
which then cause random problems because the X86 backend is
producing mmx stuff without inserting proper emms calls.

In the short term, force off MMX datatypes.  In the long term,
the X86 backend should not select generic vector types to MMX
registers.  This is being worked on, but won't be done in time
for 2.8.  rdar://8380055

llvm-svn: 112696

34e5361e

Aug 18, 2010
- remove dead prototype. · 6aabb661
  Chris Lattner authored Aug 18, 2010
  
  llvm-svn: 111342
  6aabb661