Commits · 4dc1fd938f005a66f57b5b59450af31a70f06329 · Roger Ferrer / llvm-epi-0.8

Jan 08, 2011

enhance memcpyopt to merge a store and a subsequent · 4dc1fd93
Chris Lattner authored Jan 08, 2011
```
memset into a single larger memset.

llvm-svn: 123086
```
4dc1fd93
merge two tests and filecheckify · 9dbbc49f
Chris Lattner authored Jan 08, 2011
```
llvm-svn: 123082
```
9dbbc49f

When loop rotation happens, it is *very* common for the duplicated condbr · 59c82f85

Chris Lattner authored Jan 08, 2011

to be foldable into an uncond branch.  When this happens, we can make a
much simpler CFG for the loop, which is important for nested loop cases
where we want the outer loop to be aggressively optimized.

Handle this case more aggressively.  For example, previously on
phi-duplicate.ll we would get this:


define void @test(i32 %N, double* %G) nounwind ssp {
entry:
  %cmp1 = icmp slt i64 1, 1000
  br i1 %cmp1, label %bb.nph, label %for.end

bb.nph:                                           ; preds = %entry
  br label %for.body

for.body:                                         ; preds = %bb.nph, %for.cond
  %j.02 = phi i64 [ 1, %bb.nph ], [ %inc, %for.cond ]
  %arrayidx = getelementptr inbounds double* %G, i64 %j.02
  %tmp3 = load double* %arrayidx
  %sub = sub i64 %j.02, 1
  %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
  %tmp7 = load double* %arrayidx6
  %add = fadd double %tmp3, %tmp7
  %arrayidx10 = getelementptr inbounds double* %G, i64 %j.02
  store double %add, double* %arrayidx10
  %inc = add nsw i64 %j.02, 1
  br label %for.cond

for.cond:                                         ; preds = %for.body
  %cmp = icmp slt i64 %inc, 1000
  br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge

for.cond.for.end_crit_edge:                       ; preds = %for.cond
  br label %for.end

for.end:                                          ; preds = %for.cond.for.end_crit_edge, %entry
  ret void
}

Now we get the much nicer:

define void @test(i32 %N, double* %G) nounwind ssp {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %j.01 = phi i64 [ 1, %entry ], [ %inc, %for.body ]
  %arrayidx = getelementptr inbounds double* %G, i64 %j.01
  %tmp3 = load double* %arrayidx
  %sub = sub i64 %j.01, 1
  %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
  %tmp7 = load double* %arrayidx6
  %add = fadd double %tmp3, %tmp7
  %arrayidx10 = getelementptr inbounds double* %G, i64 %j.01
  store double %add, double* %arrayidx10
  %inc = add nsw i64 %j.01, 1
  %cmp = icmp slt i64 %inc, 1000
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

With all of these recent changes, we are now able to compile:

void foo(char *X) {
 for (int i = 0; i != 100; ++i) 
   for (int j = 0; j != 100; ++j)
     X[j+i*100] = 0;
}

into a single memset of 10000 bytes.  This series of changes
should also be helpful for other nested loop scenarios as well.

llvm-svn: 123079

59c82f85

Three major changes: · 063dca0f

Chris Lattner authored Jan 08, 2011

1. Rip out LoopRotate's domfrontier updating code.  It isn't
   needed now that LICM doesn't use DF and it is super complex
   and gross.
2. Make DomTree updating code a lot simpler and faster.  The 
   old loop over all the blocks was just to find a block??
3. Change the code that inserts the new preheader to just use
   SplitCriticalEdge instead of doing an overcomplex 
   reimplementation of it.

No behavior change, except for the name of the inserted preheader.

llvm-svn: 123072

063dca0f

Fix a bug in r123034 (trying to sext/zext non-integers) and clean up a little. · 6a1fb8f2
Frits van Bommel authored Jan 08, 2011
```
llvm-svn: 123061
```
6a1fb8f2

Have loop-rotate simplify instructions (yay instsimplify!) as it clones · 8c5defd0

Chris Lattner authored Jan 08, 2011

them into the loop preheader, eliminating silly instructions like
"icmp i32 0, 100" in fixed tripcount loops. This also better exposes the
bigger problem with loop rotate that I'd like to fix: once this has been
folded, the duplicated conditional branch *often* turns into an uncond branch.

Not aggressively handling this is pessimizing later loop optimizations
somethin' fierce by making "dominates all exit blocks" checks fail.

llvm-svn: 123060

8c5defd0

Jan 07, 2011

InstCombine: Match min/max hidden by sext/zext · fc3d7f66

Tobias Grosser authored Jan 07, 2011

X = sext x; x >s c ? X : C+1 --> X = sext x; X <s C+1 ? C+1 : X
X = sext x; x <s c ? X : C-1 --> X = sext x; X >s C-1 ? C-1 : X
X = zext x; x >u c ? X : C+1 --> X = zext x; X <u C+1 ? C+1 : X
X = zext x; x <u c ? X : C-1 --> X = zext x; X >u C-1 ? C-1 : X
X = sext x; x >u c ? X : C+1 --> X = sext x; X <u C+1 ? C+1 : X
X = sext x; x <u c ? X : C-1 --> X = sext x; X >u C-1 ? C-1 : X

Instead of calculating this with mixed types promote all to the
larger type. This enables scalar evolution to analyze this
expression. PR8866

llvm-svn: 123034

fc3d7f66

Revert 122959, it needs more thought. Add it back to README.txt with additional notes. · 134cde91
Benjamin Kramer authored Jan 07, 2011
```
llvm-svn: 123030
```
134cde91

Jan 06, 2011

InstCombine: Turn _chk functions into the "unsafe" variant if length and max langth are equal. · ae67cc13
Benjamin Kramer authored Jan 06, 2011
```
This happens when we take the (non-constant) length from a malloc.

llvm-svn: 122961
```
ae67cc13
InstCombine: If we call llvm.objectsize on a malloc call we can replace it... · 799b0112
Benjamin Kramer authored Jan 06, 2011
```
InstCombine: If we call llvm.objectsize on a malloc call we can replace it with the size passed to malloc.

llvm-svn: 122959
```
799b0112
InstCombine: Teach llvm.objectsize folding to look through GEPs. · a76cc117
Benjamin Kramer authored Jan 06, 2011
```
llvm-svn: 122958
```
a76cc117

implement constant folding support for an exotic constant expr: · 5858e091

Chris Lattner authored Jan 06, 2011

ret i64 ptrtoint (i8* getelementptr ([1000 x i8]* @X, i64 1, i64 sub (i64 0, i64 ptrtoint ([1000 x i8]* @X to i64))) to i64)

to "ret i64 1000". This allows us to correctly compute the trip count
on a loop in PR8883, which occurs with std::fill on a char array. This
allows us to transform it into a memset with a constant size.

llvm-svn: 122950

5858e091

Jan 04, 2011

fix an off-by-one bug that caused a crash analyzing · c86e67e1
Chris Lattner authored Jan 04, 2011
```
ashr's with huge shift amounts, PR8896

llvm-svn: 122814
```
c86e67e1

Teach loop-idiom to turn a loop containing a memset into a larger memset · 8643810e

Chris Lattner authored Jan 04, 2011

when safe.

The testcase is basically this nested loop:
void foo(char *X) {
  for (int i = 0; i != 100; ++i) 
    for (int j = 0; j != 100; ++j)
      X[j+i*100] = 0;
}

which gets turned into a single memset now.  clang -O3 doesn't optimize
this yet though due to a phase ordering issue I haven't analyzed yet.

llvm-svn: 122806

8643810e

Duncan deftly points out that readnone functions aren't · bde6ec1d
Chris Lattner authored Jan 03, 2011
```
invalidated by stores, so they can be handled as 'simple'
operations.

llvm-svn: 122785
```
bde6ec1d

Jan 03, 2011
- earlycse can do trivial with-a-block dead store · 9e5e9ed7
  Chris Lattner authored Jan 03, 2011
```
elimination as well.  This deletes 60 stores in 176.gcc
that largely come from bitfield code.

llvm-svn: 122736
```
  9e5e9ed7
- now that loads are in their own table, we can implement · e0e32a9e
  Chris Lattner authored Jan 03, 2011
```
store->load forwarding.  This allows EarlyCSE to zap 600 more
loads from 176.gcc.

llvm-svn: 122732
```
  e0e32a9e
- add a testcase for readonly call CSE · 0446bb23
  Chris Lattner authored Jan 03, 2011
```
llvm-svn: 122730
```
  0446bb23
- Teach EarlyCSE to do trivial CSE of loads and read-only calls. · b9a8efc9
  Chris Lattner authored Jan 03, 2011
```
On 176.gcc, this catches 13090 loads and calls, and increases the
number of simple instructions CSE'd from 29658 to 36208.

llvm-svn: 122727
```
  b9a8efc9
- add DEBUG and -stats output to earlycse. · 8fac5db2
  Chris Lattner authored Jan 02, 2011
```
Teach it to CSE the rest of the non-side-effecting instructions.

llvm-svn: 122716
```
  8fac5db2
- Enhance earlycse to do CSE of casts, instsimplify and die. · 18ae5436
  Chris Lattner authored Jan 02, 2011
```
Add a testcase.

llvm-svn: 122715
```
  18ae5436
Jan 02, 2011

fix a miscompilation of tramp3d-v4: when forming a memcpy, we have to make · 9c69406f

Chris Lattner authored Jan 02, 2011

sure that the loop we're promoting into a memcpy doesn't mutate the input
of the memcpy.  Before we were just checking that the dest of the memcpy
wasn't mod/ref'd by the loop.

llvm-svn: 122712

9c69406f

If a loop iterates exactly once (has backedge count = 0) then don't · 5702a43c
Chris Lattner authored Jan 02, 2011
```
mess with it.  We'd rather peel/unroll it than convert all of its 
stores into memsets.

llvm-svn: 122711
```
5702a43c

enhance loop idiom recognition to scan *all* unconditionally executed · 8455b6e4

Chris Lattner authored Jan 02, 2011

blocks in a loop, instead of just the header block.  This makes it more
aggressive, able to handle Duncan's Ada examples.

llvm-svn: 122704

8455b6e4

Fix PR8702 by not having LoopSimplify claim to preserve LCSSA form. As described · 64f1c0dc

Duncan Sands authored Jan 02, 2011

in the PR, the pass could break LCSSA form when inserting preheaders.  It probably
would be easy enough to fix this, but since currently we always go into LCSSA form
after running this pass, doing so is not urgent.

llvm-svn: 122695

64f1c0dc

Allow loop-idiom to run on multiple BB loops, but still only scan the loop · ddf58010

Chris Lattner authored Jan 02, 2011

header for now for memset/memcpy opportunities.  It turns out that loop-rotate
is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for 
loops" into 2 basic block loops that loop-idiom was ignoring.

With this fix, we form many *many* more memcpy and memsets than before, including
on the "history" loops in the viterbi benchmark, which look like this:

        for (j=0; j<MAX_history; ++j) {
          history_new[i][j+1] = history[2*i][j];
        }

Transforming these loops into memcpy's speeds up the viterbi benchmark from
11.98s to 3.55s on my machine.  Woo.

llvm-svn: 122685

ddf58010

teach loop idiom recognition to form memcpy's from simple loops. · 85b6d81d
Chris Lattner authored Jan 02, 2011
```
llvm-svn: 122678
```
85b6d81d

Jan 01, 2011

fix a globalopt crash on two Adobe-C++ testcases that the recent · 1903c42b
Chris Lattner authored Jan 01, 2011
```
loop idiom pass exposed.

llvm-svn: 122674
```
1903c42b
add a validity check that was missed, fixing a crash on the · a3514441
Chris Lattner authored Jan 01, 2011
```
new testcase.

llvm-svn: 122662
```
a3514441
Revert commit 122654 at the request of Chris, who reckons that instsimplify · 772749ae
Duncan Sands authored Jan 01, 2011
```
is the wrong hammer for this nail, and is probably right.

llvm-svn: 122661
```
772749ae
improve validity check to handle constant-trip-count loops more · 91a44358
Chris Lattner authored Jan 01, 2011
```
aggressively.  In practice, this doesn't help anything though,
see the todo.

llvm-svn: 122660
```
91a44358
implement the "no aliasing accesses in loop" safety check. This pass · 8b3baf6d
Chris Lattner authored Jan 01, 2011
```
should be correct now.

llvm-svn: 122659
```
8b3baf6d

Fix a README item by having InstructionSimplify do a mild form of value · e3c53958

Duncan Sands authored Jan 01, 2011

numbering, in which it considers (for example) "%a = add i32 %x, %y" and
"%b = add i32 %x, %y" to be equal because the operands are equal and the
result of the instructions only depends on the values of the operands.
This has almost no effect (it removes 4 instructions from gcc-as-one-file),
and perhaps slows down compilation: I measured a 0.4% slowdown on the large
gcc-as-one-file testcase, but it wasn't statistically significant.

llvm-svn: 122654

e3c53958

Dec 29, 2010
- test/Transforms/ConstProp/logicaltest.ll: FileCheck-ize. · a8340089
  NAKAMURA Takumi authored Dec 29, 2010
```
llvm-svn: 122620
```
  a8340089
Dec 27, 2010

implement enough of the memset inference algorithm to recognize and insert · 29e14edc

Chris Lattner authored Dec 26, 2010

memsets.  This is still missing one important validity check, but this is enough
to compile stuff like this:

void test0(std::vector<char> &X) {
  for (std::vector<char>::iterator I = X.begin(), E = X.end(); I != E; ++I)
    *I = 0;
}

void test1(std::vector<int> &X) {
  for (long i = 0, e = X.size(); i != e; ++i)
    X[i] = 0x01010101;
}

With:
 $ clang t.cpp -S -o - -O2 -emit-llvm | opt -loop-idiom | opt -O3 | llc 

to:

__Z5test0RSt6vectorIcSaIcEE:            ## @_Z5test0RSt6vectorIcSaIcEE
## BB#0:                                ## %entry
	subq	$8, %rsp
	movq	(%rdi), %rax
	movq	8(%rdi), %rsi
	cmpq	%rsi, %rax
	je	LBB0_2
## BB#1:                                ## %bb.nph
	subq	%rax, %rsi
	movq	%rax, %rdi
	callq	___bzero
LBB0_2:                                 ## %for.end
	addq	$8, %rsp
	ret
...
__Z5test1RSt6vectorIiSaIiEE:            ## @_Z5test1RSt6vectorIiSaIiEE
## BB#0:                                ## %entry
	subq	$8, %rsp
	movq	(%rdi), %rax
	movq	8(%rdi), %rdx
	subq	%rax, %rdx
	cmpq	$4, %rdx
	jb	LBB1_2
## BB#1:                                ## %for.body.preheader
	andq	$-4, %rdx
	movl	$1, %esi
	movq	%rax, %rdi
	callq	_memset
LBB1_2:                                 ## %for.end
	addq	$8, %rsp
	ret

llvm-svn: 122573

29e14edc

Dec 26, 2010
- start using irbuilder to make mem intrinsics in a few passes. · 6cf8d6cc
  Chris Lattner authored Dec 26, 2010
```
llvm-svn: 122572
```
  6cf8d6cc
Dec 24, 2010

MemCpyOpt: Turn memcpys from a constant into a memset if possible. · ea9152e5

Benjamin Kramer authored Dec 24, 2010

This allows us to compile "int cst[] = {-1, -1, -1};" into
  movl  $-1, 16(%rsp)
  movq  $-1, 8(%rsp)
instead of
  movl  _cst+8(%rip), %eax
  movl  %eax, 16(%rsp)
  movq  _cst(%rip), %rax
  movq  %rax, 8(%rsp)

llvm-svn: 122548

ea9152e5

When determining if we can fold (x >> C1) << C2, the bits that we need to verify are zero · 226ac14a
Owen Anderson authored Dec 23, 2010
```
are not the low bits of x, but the bits that WILL be the low bits after the operation completes.

llvm-svn: 122529
```
226ac14a

Dec 23, 2010
- InstCombine: creating selects from -1 and 0 is fine, they combine into a sext from i1. · 8ef5001b
  Benjamin Kramer authored Dec 22, 2010
```
llvm-svn: 122453
```
  8ef5001b
Dec 22, 2010

When determining whether the new instruction was already present in · a45cfbd4

Duncan Sands authored Dec 22, 2010

the original instruction, half the cases were missed (making it not
wrong but suboptimal).  Also correct a typo (A <-> B) in the second
chunk. 

llvm-svn: 122414

a45cfbd4