Commits · 4dc1fd938f005a66f57b5b59450af31a70f06329 · Roger Ferrer / llvm-epi-0.8

Jan 08, 2011

enhance memcpyopt to merge a store and a subsequent · 4dc1fd93
Chris Lattner authored Jan 08, 2011
```
memset into a single larger memset.

llvm-svn: 123086
```
4dc1fd93
merge two tests and filecheckify · 9dbbc49f
Chris Lattner authored Jan 08, 2011
```
llvm-svn: 123082
```
9dbbc49f

When loop rotation happens, it is *very* common for the duplicated condbr · 59c82f85

Chris Lattner authored Jan 08, 2011

to be foldable into an uncond branch.  When this happens, we can make a
much simpler CFG for the loop, which is important for nested loop cases
where we want the outer loop to be aggressively optimized.

Handle this case more aggressively.  For example, previously on
phi-duplicate.ll we would get this:


define void @test(i32 %N, double* %G) nounwind ssp {
entry:
  %cmp1 = icmp slt i64 1, 1000
  br i1 %cmp1, label %bb.nph, label %for.end

bb.nph:                                           ; preds = %entry
  br label %for.body

for.body:                                         ; preds = %bb.nph, %for.cond
  %j.02 = phi i64 [ 1, %bb.nph ], [ %inc, %for.cond ]
  %arrayidx = getelementptr inbounds double* %G, i64 %j.02
  %tmp3 = load double* %arrayidx
  %sub = sub i64 %j.02, 1
  %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
  %tmp7 = load double* %arrayidx6
  %add = fadd double %tmp3, %tmp7
  %arrayidx10 = getelementptr inbounds double* %G, i64 %j.02
  store double %add, double* %arrayidx10
  %inc = add nsw i64 %j.02, 1
  br label %for.cond

for.cond:                                         ; preds = %for.body
  %cmp = icmp slt i64 %inc, 1000
  br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge

for.cond.for.end_crit_edge:                       ; preds = %for.cond
  br label %for.end

for.end:                                          ; preds = %for.cond.for.end_crit_edge, %entry
  ret void
}

Now we get the much nicer:

define void @test(i32 %N, double* %G) nounwind ssp {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %j.01 = phi i64 [ 1, %entry ], [ %inc, %for.body ]
  %arrayidx = getelementptr inbounds double* %G, i64 %j.01
  %tmp3 = load double* %arrayidx
  %sub = sub i64 %j.01, 1
  %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
  %tmp7 = load double* %arrayidx6
  %add = fadd double %tmp3, %tmp7
  %arrayidx10 = getelementptr inbounds double* %G, i64 %j.01
  store double %add, double* %arrayidx10
  %inc = add nsw i64 %j.01, 1
  %cmp = icmp slt i64 %inc, 1000
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

With all of these recent changes, we are now able to compile:

void foo(char *X) {
 for (int i = 0; i != 100; ++i) 
   for (int j = 0; j != 100; ++j)
     X[j+i*100] = 0;
}

into a single memset of 10000 bytes.  This series of changes
should also be helpful for other nested loop scenarios as well.

llvm-svn: 123079

59c82f85

Three major changes: · 063dca0f

Chris Lattner authored Jan 08, 2011

1. Rip out LoopRotate's domfrontier updating code.  It isn't
   needed now that LICM doesn't use DF and it is super complex
   and gross.
2. Make DomTree updating code a lot simpler and faster.  The 
   old loop over all the blocks was just to find a block??
3. Change the code that inserts the new preheader to just use
   SplitCriticalEdge instead of doing an overcomplex 
   reimplementation of it.

No behavior change, except for the name of the inserted preheader.

llvm-svn: 123072

063dca0f

First step in fixing PR8927: · 45e6c195

Rafael Espindola authored Jan 08, 2011

Add a unnamed_addr bit to global variables and functions. This will be used
to indicate that the address is not significant and therefore the constant
or function can be merged with others.

If an optimization pass can show that an address is not used, it can set this.

Examples of things that can have this set by the FE are globals created to
hold string literals and C++ constructors.

Adding unnamed_addr to a non-const global should have no effect unless
an optimization can transform that global into a constant.

Aliases are not allowed to have unnamed_addr since I couldn't figure
out any use for it.

llvm-svn: 123063

45e6c195

Fix a bug in r123034 (trying to sext/zext non-integers) and clean up a little. · 6a1fb8f2
Frits van Bommel authored Jan 08, 2011
```
llvm-svn: 123061
```
6a1fb8f2

Have loop-rotate simplify instructions (yay instsimplify!) as it clones · 8c5defd0

Chris Lattner authored Jan 08, 2011

them into the loop preheader, eliminating silly instructions like
"icmp i32 0, 100" in fixed tripcount loops. This also better exposes the
bigger problem with loop rotate that I'd like to fix: once this has been
folded, the duplicated conditional branch *often* turns into an uncond branch.

Not aggressively handling this is pessimizing later loop optimizations
somethin' fierce by making "dominates all exit blocks" checks fail.

llvm-svn: 123060

8c5defd0

Recognize inline asm 'rev /bin/bash, ' as a bswap intrinsic call. · 078b0b09
Evan Cheng authored Jan 08, 2011
```
llvm-svn: 123048
```
078b0b09

Do not model all INLINEASM instructions as having unmodelled side effects. · 6eb516db

Evan Cheng authored Jan 07, 2011

Instead encode llvm IR level property "HasSideEffects" in an operand (shared
with IsAlignStack). Added MachineInstrs::hasUnmodeledSideEffects() to check
the operand when the instruction is an INLINEASM.

This allows memory instructions to be moved around INLINEASM instructions.

llvm-svn: 123044

6eb516db

Jan 07, 2011

Speculatively revert r123032. · acbee0b0
Devang Patel authored Jan 07, 2011
```
llvm-svn: 123039
```
acbee0b0
Lower some BUILD_VECTORS using VEXT+shuffle. · 6f2b8966
Bob Wilson authored Jan 07, 2011
```
Patch by Tim Northover.

llvm-svn: 123035
```
6f2b8966

InstCombine: Match min/max hidden by sext/zext · fc3d7f66

Tobias Grosser authored Jan 07, 2011

X = sext x; x >s c ? X : C+1 --> X = sext x; X <s C+1 ? C+1 : X
X = sext x; x <s c ? X : C-1 --> X = sext x; X >s C-1 ? C-1 : X
X = zext x; x >u c ? X : C+1 --> X = zext x; X <u C+1 ? C+1 : X
X = zext x; x <u c ? X : C-1 --> X = zext x; X >u C-1 ? C-1 : X
X = sext x; x >u c ? X : C+1 --> X = sext x; X <u C+1 ? C+1 : X
X = sext x; x <u c ? X : C-1 --> X = sext x; X >u C-1 ? C-1 : X

Instead of calculating this with mixed types promote all to the
larger type. This enables scalar evolution to analyze this
expression. PR8866

llvm-svn: 123034

fc3d7f66

Appropriately truncate debug info range in dwarf output. · 6381e158
Devang Patel authored Jan 07, 2011
```
Enable live debug variables pass.

llvm-svn: 123032
```
6381e158
Revert 122959, it needs more thought. Add it back to README.txt with additional notes. · 134cde91
Benjamin Kramer authored Jan 07, 2011
```
llvm-svn: 123030
```
134cde91

Revert r122955. It seems using movups to lower memcpy can cause massive... · a048c83f

Evan Cheng authored Jan 07, 2011

Revert r122955. It seems using movups to lower memcpy can cause massive regression (even on Nehalem) in edge cases. I also didn't see any real performance benefit.

llvm-svn: 123015

a048c83f

· 2f7cf7fc

David Greene authored Jan 07, 2011

Rename lisp-like functions as suggested by Gabor Greif as loooong time
ago.  This is both easier to learn and easier to read.

llvm-svn: 123001

2f7cf7fc

Try to unbreak the arm buildbot. · 1ec7ecce
Benjamin Kramer authored Jan 07, 2011
```
llvm-svn: 122999
```
1ec7ecce
Add testcases for PR8411 (vget_low and vget_high implemented as shuffles). · 99da75c1
Bob Wilson authored Jan 07, 2011
```
llvm-svn: 122997
```
99da75c1

Add ARM patterns to match EXTRACT_SUBVECTOR nodes. · 8265d566

Bob Wilson authored Jan 07, 2011

Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle
vectors from being translated to EXTRACT_SUBVECTOR.
Patch by Tim Northover.

The test changes are needed to keep those spill-q tests from testing aligned
spills and restores.  If the only aligned stack objects are spill slots, we
no longer realign the stack frame.  Prior to this patch, an EXTRACT_SUBVECTOR
was legalized by loading from the stack, which created an aligned frame index.
Now, however, there is nothing except the spill slot in the stack frame, so
I added an aligned alloca.

llvm-svn: 122995

8265d566

Fix the other problem reported in PR8582. Testcase and patch by · 61c5708b
Duncan Sands authored Jan 06, 2011
```
Nadav Rotem.

llvm-svn: 122983
```
61c5708b
Add a testcase for PR8582, which mysteriously fixed itself, in case the problem · 64b75da0
Duncan Sands authored Jan 06, 2011
```
comes back some day.

llvm-svn: 122982
```
64b75da0

Jan 06, 2011
- PR8921: LDM/POP do not support interworking prior to v5t. · 914df82a
  Bob Wilson authored Jan 06, 2011
```
llvm-svn: 122970
```
  914df82a
- Correctly disassemble truncated asm. · 9f9a1069
  Rafael Espindola authored Jan 06, 2011
```
Patch by Richard Simth.

llvm-svn: 122962
```
  9f9a1069
- InstCombine: Turn _chk functions into the "unsafe" variant if length and max langth are equal. · ae67cc13
  Benjamin Kramer authored Jan 06, 2011
```
This happens when we take the (non-constant) length from a malloc.

llvm-svn: 122961
```
  ae67cc13
- InstCombine: If we call llvm.objectsize on a malloc call we can replace it... · 799b0112
  Benjamin Kramer authored Jan 06, 2011
```
InstCombine: If we call llvm.objectsize on a malloc call we can replace it with the size passed to malloc.

llvm-svn: 122959
```
  799b0112
- InstCombine: Teach llvm.objectsize folding to look through GEPs. · a76cc117
  Benjamin Kramer authored Jan 06, 2011
```
llvm-svn: 122958
```
  a76cc117
- Use movups to lower memcpy and memset even if it's not fast (like corei7). · 7998b1d6
  Evan Cheng authored Jan 06, 2011
```
The theory is it's still faster than a pair of movq / a quad of movl. This
will probably hurt older chips like P4 but should run faster on current
and future Intel processors. rdar://8817010

llvm-svn: 122955
```
  7998b1d6
- Re-implement r122936 with proper target hooks. Now getMaxStoresPerMemcpy · 3ae2b79a
  Evan Cheng authored Jan 06, 2011
```
etc. takes an option OptSize. If OptSize is true, it would return
the inline limit for functions with attribute OptSize.

llvm-svn: 122952
```
  3ae2b79a
- implement constant folding support for an exotic constant expr: · 5858e091
  Chris Lattner authored Jan 06, 2011
```
  ret i64 ptrtoint (i8* getelementptr ([1000 x i8]* @X, i64 1, i64 sub (i64 0, i64 ptrtoint ([1000 x i8]* @X to i64))) to i64)

to "ret i64 1000".  This allows us to correctly compute the trip count
on a loop in PR8883, which occurs with std::fill on a char array.  This
allows us to transform it into a memset with a constant size.

llvm-svn: 122950
```
  5858e091
- Revert r122936. I'll re-implement the change. · c052ba7f
  Evan Cheng authored Jan 06, 2011
```
llvm-svn: 122949
```
  c052ba7f
- Fix test to coincide with r122934 change from PR8919. · 2b898548
  Bill Wendling authored Jan 06, 2011
```
llvm-svn: 122937
```
  2b898548
- r105228 reduced the memcpy / memset inline limit to 4 with -Os to avoid blowing · 06536e71
  Evan Cheng authored Jan 06, 2011
```
up freebsd bootloader. However, this doesn't make much sense for Darwin, whose
-Os is meant to optimize for size only if it doesn't hurt performance.
rdar://8821501

llvm-svn: 122936
```
  06536e71
- Avoid zero extend bit test operands to pointer type if all the masks fit in · ac730dd2
  Evan Cheng authored Jan 06, 2011
```
the original type of the switch statement key.
rdar://8781238

llvm-svn: 122935
```
  ac730dd2
- Optimize: · 260acf32
  Evan Cheng authored Jan 05, 2011
```
  r1025 = s/zext r1024, 4
  r1026 = extract_subreg r1025, 4
to:
  r1026 = copy r1024

llvm-svn: 122925
```
  260acf32
Jan 05, 2011
- fix PR8900, a shuffle miscompilation. Patch by Nadav Rotem! · 872908fd
  Chris Lattner authored Jan 05, 2011
```
llvm-svn: 122921
```
  872908fd
- Fix lit for people whose LLVM path contains 'opt', which is a common directory... · c7d65b42
  Frits van Bommel authored Jan 05, 2011
```
Fix lit for people whose LLVM path contains 'opt', which is a common directory name on Unix-like systems.

llvm-svn: 122873
```
  c7d65b42
Jan 04, 2011

fix an off-by-one bug that caused a crash analyzing · c86e67e1
Chris Lattner authored Jan 04, 2011
```
ashr's with huge shift amounts, PR8896

llvm-svn: 122814
```
c86e67e1
Include llvm-gcc dir before llvm_tools_dir · 4ccb9238
Tobias Grosser authored Jan 04, 2011
```
This ensures that always the recently compiled tools are picked for testing.

llvm-svn: 122810
```
4ccb9238

Teach loop-idiom to turn a loop containing a memset into a larger memset · 8643810e

Chris Lattner authored Jan 04, 2011

when safe.

The testcase is basically this nested loop:
void foo(char *X) {
  for (int i = 0; i != 100; ++i) 
    for (int j = 0; j != 100; ++j)
      X[j+i*100] = 0;
}

which gets turned into a single memset now.  clang -O3 doesn't optimize
this yet though due to a phase ordering issue I haven't analyzed yet.

llvm-svn: 122806

8643810e

· 4b71405b

David Greene authored Jan 04, 2011

Don't pattern match "/clang" so we don't mangle directory names.  Some
tests use absolute paths to clang.

llvm-svn: 122796

4b71405b