Commits · 6ff70ad356a1474176f8e39b5a93acb4ebfdc5c4 · Roger Ferrer / llvm-epi-0.8

Jan 09, 2011
- Fix a MachineVerifier loop that probably didn't mean to skip the last two · 6ff70ad3
  Jakob Stoklund Olesen authored Jan 08, 2011
```
virtual registers.

llvm-svn: 123100
```
  6ff70ad3
- Don't document exactly how virtual registers are represented as integers. Code · d3438eb2
  Jakob Stoklund Olesen authored Jan 08, 2011
```
shouldn't depend directly on that.

Give an example of how to iterate over all virtual registers in a function
without depending on the representation.

llvm-svn: 123099
```
  d3438eb2
- Use an IndexedMap for LiveVariables::VirtRegInfo. · 28d76692
  Jakob Stoklund Olesen authored Jan 08, 2011
```
Provide MRI::getNumVirtRegs() and TRI::index2VirtReg() functions to allow
iteration over virtual registers without depending on the representation of
virtual register numbers.

llvm-svn: 123098
```
  28d76692
- Do not talk about TargetRegisterInfo::FirstVirtualRegister. · a1e03cfb
  Jakob Stoklund Olesen authored Jan 08, 2011
```
llvm-svn: 123097
```
  a1e03cfb
- Use an IndexedMap for LiveOutRegInfo to hide its dependence on... · 793d7b76
  Jakob Stoklund Olesen authored Jan 08, 2011
```
Use an IndexedMap for LiveOutRegInfo to hide its dependence on TargetRegisterInfo::FirstVirtualRegister.

llvm-svn: 123096
```
  793d7b76
Jan 08, 2011

Fix coding style. · 0939bc37
Cameron Zwarich authored Jan 08, 2011
```
llvm-svn: 123093
```
0939bc37
fix a latent bug in memcpyoptimizer that my recent patches exposed: it wasn't · 7d6433ae
Chris Lattner authored Jan 08, 2011
```
updating memdep when fusing stores together.  This fixes the crash optimizing
the bullet benchmark.

llvm-svn: 123091
```
7d6433ae
tryMergingIntoMemset can only handle constant length memsets. · ff6ed2ac
Chris Lattner authored Jan 08, 2011
```
llvm-svn: 123090
```
ff6ed2ac

Merge memsets followed by neighboring memsets and other stores into · 9a1d63ba

Chris Lattner authored Jan 08, 2011

larger memsets.  Among other things, this fixes rdar://8760394 and
allows us to handle "Example 2" from http://blog.regehr.org/archives/320,
compiling it into a single 4096-byte memset:

_mad_synth_mute:                        ## @mad_synth_mute
## BB#0:                                ## %entry
	pushq	%rax
	movl	$4096, %esi             ## imm = 0x1000
	callq	___bzero
	popq	%rax
	ret

llvm-svn: 123089

9a1d63ba

fix an issue in IsPointerOffset that prevented us from recognizing that · 5120ebf1
Chris Lattner authored Jan 08, 2011
```
P and P+1 are relative to the same base pointer.

llvm-svn: 123087
```
5120ebf1
enhance memcpyopt to merge a store and a subsequent · 4dc1fd93
Chris Lattner authored Jan 08, 2011
```
memset into a single larger memset.

llvm-svn: 123086
```
4dc1fd93
fit in 80 cols · 2f2c3351
Chris Lattner authored Jan 08, 2011
```
llvm-svn: 123085
```
2f2c3351
merge two tests and filecheckify · 9dbbc49f
Chris Lattner authored Jan 08, 2011
```
llvm-svn: 123082
```
9dbbc49f

constify TargetData references. · c638147e

Chris Lattner authored Jan 08, 2011

Split memset formation logic out into its own
"tryMergingIntoMemset" helper function.

llvm-svn: 123081

c638147e

When loop rotation happens, it is *very* common for the duplicated condbr · 59c82f85

Chris Lattner authored Jan 08, 2011

to be foldable into an uncond branch.  When this happens, we can make a
much simpler CFG for the loop, which is important for nested loop cases
where we want the outer loop to be aggressively optimized.

Handle this case more aggressively.  For example, previously on
phi-duplicate.ll we would get this:


define void @test(i32 %N, double* %G) nounwind ssp {
entry:
  %cmp1 = icmp slt i64 1, 1000
  br i1 %cmp1, label %bb.nph, label %for.end

bb.nph:                                           ; preds = %entry
  br label %for.body

for.body:                                         ; preds = %bb.nph, %for.cond
  %j.02 = phi i64 [ 1, %bb.nph ], [ %inc, %for.cond ]
  %arrayidx = getelementptr inbounds double* %G, i64 %j.02
  %tmp3 = load double* %arrayidx
  %sub = sub i64 %j.02, 1
  %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
  %tmp7 = load double* %arrayidx6
  %add = fadd double %tmp3, %tmp7
  %arrayidx10 = getelementptr inbounds double* %G, i64 %j.02
  store double %add, double* %arrayidx10
  %inc = add nsw i64 %j.02, 1
  br label %for.cond

for.cond:                                         ; preds = %for.body
  %cmp = icmp slt i64 %inc, 1000
  br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge

for.cond.for.end_crit_edge:                       ; preds = %for.cond
  br label %for.end

for.end:                                          ; preds = %for.cond.for.end_crit_edge, %entry
  ret void
}

Now we get the much nicer:

define void @test(i32 %N, double* %G) nounwind ssp {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %j.01 = phi i64 [ 1, %entry ], [ %inc, %for.body ]
  %arrayidx = getelementptr inbounds double* %G, i64 %j.01
  %tmp3 = load double* %arrayidx
  %sub = sub i64 %j.01, 1
  %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
  %tmp7 = load double* %arrayidx6
  %add = fadd double %tmp3, %tmp7
  %arrayidx10 = getelementptr inbounds double* %G, i64 %j.01
  store double %add, double* %arrayidx10
  %inc = add nsw i64 %j.01, 1
  %cmp = icmp slt i64 %inc, 1000
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

With all of these recent changes, we are now able to compile:

void foo(char *X) {
 for (int i = 0; i != 100; ++i) 
   for (int j = 0; j != 100; ++j)
     X[j+i*100] = 0;
}

into a single memset of 10000 bytes.  This series of changes
should also be helpful for other nested loop scenarios as well.

llvm-svn: 123079

59c82f85

make domtree verification print something useful on failure. · 5f7734c4
Chris Lattner authored Jan 08, 2011
```
llvm-svn: 123078
```
5f7734c4
split ssa updating code out to its own helper function. Don't bother · 30f318e5
Chris Lattner authored Jan 08, 2011
```
moving the OrigHeader block anymore: we just merge it away anyway so
its code layout doesn't matter.

llvm-svn: 123077
```
30f318e5

Implement a TODO: Enhance loopinfo to merge away the unconditional branch · 2615130e

Chris Lattner authored Jan 08, 2011

that it was leaving in loops after rotation (between the original latch
block and the original header.

With this change, it is possible for rotated loops to have just a single
basic block, which is useful.

llvm-svn: 123075

2615130e

various code cleanups, enhance MergeBlockIntoPredecessor to preserve · 930b716e
Chris Lattner authored Jan 08, 2011
```
loop info.

llvm-svn: 123074
```
930b716e
inline preserveCanonicalLoopForm now that it is simple. · fee37c5f
Chris Lattner authored Jan 08, 2011
```
llvm-svn: 123073
```
fee37c5f

Three major changes: · 063dca0f

Chris Lattner authored Jan 08, 2011

1. Rip out LoopRotate's domfrontier updating code.  It isn't
   needed now that LICM doesn't use DF and it is super complex
   and gross.
2. Make DomTree updating code a lot simpler and faster.  The 
   old loop over all the blocks was just to find a block??
3. Change the code that inserts the new preheader to just use
   SplitCriticalEdge instead of doing an overcomplex 
   reimplementation of it.

No behavior change, except for the name of the inserted preheader.

llvm-svn: 123072

063dca0f

reduce nesting. · 30d95f9f
Chris Lattner authored Jan 08, 2011
```
llvm-svn: 123071
```
30d95f9f

On Windows, replace each occurrence of '\' by '\\' on the replacement string.... · 7c9eab8f

Francois Pichet authored Jan 08, 2011

On Windows, replace each occurrence of '\' by '\\' on the replacement string. This is necessary to prevent re.sub from replacing escape sequences occurring in path.

For example:

llvm\tools\clang\test
was replaced by
llvm <tab> ools\clang <tab> est

llvm-svn: 123070

7c9eab8f

LoopRotate requires canonical loop form, so it always has preheaders · 7fab23bc
Chris Lattner authored Jan 08, 2011
```
and latch blocks.  Reorder entry conditions to make hte pass faster
and more logical.

llvm-svn: 123069
```
7fab23bc
use the LI ivar. · d62691f4
Chris Lattner authored Jan 08, 2011
```
llvm-svn: 123068
```
d62691f4
some cleanups: remove dead arguments and eliminate ivars · 385f2ec6
Chris Lattner authored Jan 08, 2011
```
that are just passed to one function.

llvm-svn: 123067
```
385f2ec6
fix an issue duncan pointed out, which could cause loop rotate · 25ba40a0
Chris Lattner authored Jan 08, 2011
```
to violate LCSSA form

llvm-svn: 123066
```
25ba40a0
Fix coding style issues. · b4ab257b
Cameron Zwarich authored Jan 08, 2011
```
llvm-svn: 123065
```
b4ab257b

Make more passes preserve dominators (or state that they preserve dominators if · 84986b29

Cameron Zwarich authored Jan 08, 2011

they all ready do). This removes two dominator recomputations prior to isel,
which is a 1% improvement in total llc time for 403.gcc.

The only potentially suspect thing is making GCStrategy recompute dominators if
it used a custom lowering strategy.

llvm-svn: 123064

84986b29

First step in fixing PR8927: · 45e6c195

Rafael Espindola authored Jan 08, 2011

Add a unnamed_addr bit to global variables and functions. This will be used
to indicate that the address is not significant and therefore the constant
or function can be merged with others.

If an optimization pass can show that an address is not used, it can set this.

Examples of things that can have this set by the FE are globals created to
hold string literals and C++ constructors.

Adding unnamed_addr to a non-const global should have no effect unless
an optimization can transform that global into a constant.

Aliases are not allowed to have unnamed_addr since I couldn't figure
out any use for it.

llvm-svn: 123063

45e6c195

Contract subloop bodies. However, it is still important to visit the phis at the · 80bd9af7
Cameron Zwarich authored Jan 08, 2011
```
top of subloop headers, as the phi uses logically occur outside of the subloop.

llvm-svn: 123062
```
80bd9af7
Fix a bug in r123034 (trying to sext/zext non-integers) and clean up a little. · 6a1fb8f2
Frits van Bommel authored Jan 08, 2011
```
llvm-svn: 123061
```
6a1fb8f2

Have loop-rotate simplify instructions (yay instsimplify!) as it clones · 8c5defd0

Chris Lattner authored Jan 08, 2011

them into the loop preheader, eliminating silly instructions like
"icmp i32 0, 100" in fixed tripcount loops. This also better exposes the
bigger problem with loop rotate that I'd like to fix: once this has been
folded, the duplicated conditional branch *often* turns into an uncond branch.

Not aggressively handling this is pessimizing later loop optimizations
somethin' fierce by making "dominates all exit blocks" checks fail.

llvm-svn: 123060

8c5defd0

make this file properly self contained. · 75c82cb5
Chris Lattner authored Jan 08, 2011
```
llvm-svn: 123059
```
75c82cb5

Revamp the ValueMapper interfaces in a couple ways: · 43f8d164

Chris Lattner authored Jan 08, 2011

1. Take a flags argument instead of a bool.  This makes
   it more clear to the reader what it is used for.
2. Add a flag that says that "remapping a value not in the
   map is ok".
3. Reimplement MapValue to share a bunch of code and be a lot
   more efficient.  For lookup failures, don't drop null values
   into the map.
4. Using the new flag a bunch of code can vaporize in LinkModules
   and LoopUnswitch, kill it.

No functionality change.

llvm-svn: 123058

43f8d164

two minor changes: switch to the standard ValueToValueMapTy · 2b3f20e6

Chris Lattner authored Jan 08, 2011

map from ValueMapper.h (giving us access to its utilities)
and add a fastpath in the loop rotation code, avoiding expensive
ssa updator manipulation for values with nothing to update.

llvm-svn: 123057

2b3f20e6

I don't think I could find a 10.2.x box if I tried. · 46779e19
Eric Christopher authored Jan 08, 2011
```
llvm-svn: 123051
```
46779e19
Recognize inline asm 'rev /bin/bash, ' as a bswap intrinsic call. · 078b0b09
Evan Cheng authored Jan 08, 2011
```
llvm-svn: 123048
```
078b0b09

Do not model all INLINEASM instructions as having unmodelled side effects. · 6eb516db

Evan Cheng authored Jan 07, 2011

Instead encode llvm IR level property "HasSideEffects" in an operand (shared
with IsAlignStack). Added MachineInstrs::hasUnmodeledSideEffects() to check
the operand when the instruction is an INLINEASM.

This allows memory instructions to be moved around INLINEASM instructions.

llvm-svn: 123044

6eb516db

Use __builtin_shufflevector to implement vget_low and vget_high intrinsics. · 006089b7
Bob Wilson authored Jan 07, 2011
```
This was suggested by Edmund Grimley Evans in pr8411.

llvm-svn: 123043
```
006089b7