Commits · a612049dd849d1e659eeaeefbcf82147af7f669f · Roger Ferrer / llvm-epi-0.8

Apr 28, 2007
- fix a bug triggered by 403.gcc · 089e35cc
  Chris Lattner authored Apr 28, 2007
```
llvm-svn: 36527
```
  089e35cc
- Fix several latent bugs in EmitGEPOffset that didn't manifest with its · 6e880871
  Chris Lattner authored Apr 28, 2007
```
previous clients.  This fixes MallocBench/gs

llvm-svn: 36525
```
  6e880871
- uhn zap cvs · c7538008
  Chris Lattner authored Apr 28, 2007
```
llvm-svn: 36523
```
  c7538008
- Implement PR1345 and Transforms/InstCombine/bitcast-gep.ll · acbf6a40
  Chris Lattner authored Apr 28, 2007
```
llvm-svn: 36521
```
  acbf6a40
Apr 27, 2007
- refactor some code relating to pointer cast xforms, pulling it out of the codepath · 1db224db
  Chris Lattner authored Apr 27, 2007
```
for unrelated casts.

llvm-svn: 36511
```
  1db224db
Apr 26, 2007
- Using APInt more efficiently. · 3178736d
  Zhou Sheng authored Apr 26, 2007
```
llvm-svn: 36475
```
  3178736d
Apr 25, 2007

Mem2Reg does not need TargetData. · d3ccc073
Devang Patel authored Apr 25, 2007
```
llvm-svn: 36444
```
d3ccc073
Remove unused function argument. · 073be55d
Devang Patel authored Apr 25, 2007
```
llvm-svn: 36441
```
073be55d

If an alloca only has two types of uses: 1) reads 2) a memcpy/memmove that · 827cb98a

Chris Lattner authored Apr 25, 2007

copies from a constant global, then we can change the reads to read from the
global instead of from the alloca.  This eliminates the alloca and the memcpy,
and promotes secondary optimizations (because the loads are now loads from
a constant global).

This is important for a common C idiom:

void foo() {
   int A[] = {1,2,3,4,5,6,7,8,9...};
   ... only reads of A ...
}

For some reason, people forget to mark the array static or const.

This triggers on these multisource benchmarks:
JM/ldecode: block_pos, [3 x [4 x [4 x i32]]]
FreeBench/mason: m, [18 x i32], inlined 4 times
MiBench/office-stringsearch: search_strings, [1332 x i8*]
MiBench/office-stringsearch: find_strings, [1333 x i8*]
Prolangs-C++/city: dirs, [9 x i8*], inlined 4 places

and these spec benchmarks:
177.mesa: message, [8 x [32 x i8]]
186.crafty: bias_rl45, [64 x i32]
186.crafty: diag_sq, [64 x i32]
186.crafty: empty, [9 x i8]
186.crafty: xlate, [15 x i8]
186.crafty: status, [13 x i8]
186.crafty: bdinfo, [25 x i8]
445.gobmk: routines, [16 x i8*]
458.sjeng: piece_rep, [14 x i8*]
458.sjeng: t, [13 x i32], inlined 4 places.
464.h264ref: block8x8_idx, [3 x [4 x [4 x i32]]]
464.h264ref: block_pos, [3 x [4 x [4 x i32]]]
464.h264ref: j_off_tab, [12 x i32]

This implements Transforms/ScalarRepl/memcpy-from-global.ll

llvm-svn: 36429

827cb98a

refactor the SROA code out into its own method, no functionality change. · 31e5addb
Chris Lattner authored Apr 25, 2007
```
llvm-svn: 36426
```
31e5addb
Undo my previous changes. Since my approach to this problem is being revised, · 510fefcd
Owen Anderson authored Apr 25, 2007
```
this approach is no longer appropriate.

llvm-svn: 36421
```
510fefcd

Fix · d3208523

Devang Patel authored Apr 25, 2007

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070423/048376.html

llvm-svn: 36417

d3208523

Apr 24, 2007

Rollback some changes that adversely affected performance. I'm currently rethinking · c24701ed
Owen Anderson authored Apr 24, 2007
```
my approach to this, so hopefully I'll find a way to do this without making this slower.

llvm-svn: 36392
```
c24701ed

Fix · 38bc86f0

Devang Patel authored Apr 23, 2007

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070423/048333.html

llvm-svn: 36380

38bc86f0

Apr 21, 2007
- Make PredicateSimplifier not use DominatorTree. · 64995e1b
  Owen Anderson authored Apr 21, 2007
```
llvm-svn: 36300
```
  64995e1b
- Fix a comment. · 2965adb8
  Owen Anderson authored Apr 21, 2007
```
llvm-svn: 36299
```
  2965adb8
Apr 20, 2007
- Move more passes to using ETForest instead of DominatorTree. · 2da606c7
  Owen Anderson authored Apr 20, 2007
```
llvm-svn: 36271
```
  2da606c7
Apr 19, 2007
- Make use of ConstantInt::isZero instead of ConstantInt::isNullValue. · aafe4e21
  Zhou Sheng authored Apr 19, 2007
```
llvm-svn: 36261
```
  aafe4e21
- Make the operations of APInt variables more efficient. · 82fcf3cb
  Zhou Sheng authored Apr 19, 2007
```
llvm-svn: 36260
```
  82fcf3cb
Apr 18, 2007
- Revert changes that caused breakage. · 9421f039
  Owen Anderson authored Apr 18, 2007
```
llvm-svn: 36255
```
  9421f039
- Switch more uses of DominatorTree over to ETForest. · 9a6091de
  Owen Anderson authored Apr 18, 2007
```
llvm-svn: 36254
```
  9a6091de
- Use ETForest instead of DominatorTree. · 550e8db9
  Owen Anderson authored Apr 18, 2007
```
llvm-svn: 36252
```
  550e8db9
- Use ETForest instead of DominatorTree. · fc40d446
  Owen Anderson authored Apr 18, 2007
```
llvm-svn: 36249
```
  fc40d446
Apr 17, 2007
- Spell doFinalization right, so that it is a proper virtual override and · 2ce1116b
  Dan Gohman authored Apr 17, 2007
```
gets called.

llvm-svn: 36208
```
  2ce1116b
- · abdff3fe
  Devang Patel authored Apr 16, 2007
```
Fix
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070416/047888.html

llvm-svn: 36182
```
  abdff3fe
Apr 16, 2007
- Removed tabs everywhere except autogenerated & external files. Add make · fb80151c
  Anton Korobeynikov authored Apr 16, 2007
```
target for tabs checking.

llvm-svn: 36146
```
  fb80151c
Apr 15, 2007

Remove ImmediateDominator analysis. The same information can be obtained from... · f35a1dbc

Owen Anderson authored Apr 15, 2007

Remove ImmediateDominator analysis.  The same information can be obtained from DomTree.  A lot of code for
constructing ImmediateDominator is now folded into DomTree construction.

This is part of the ongoing work for PR217.

llvm-svn: 36063

f35a1dbc

Extend store merging to support the 'if/then' version in addition to if/then/else. · 4a6e0cbd

Chris Lattner authored Apr 15, 2007

This sinks the two stores in this example into a single store in cond_next.  In this
case, it allows elimination of the load as well:

        store double 0.000000e+00, double* @s.3060
        %tmp3 = fcmp ogt double %tmp1, 5.000000e-01             ; <i1> [#uses=1]
        br i1 %tmp3, label %cond_true, label %cond_next
cond_true:              ; preds = %entry
        store double 1.000000e+00, double* @s.3060
        br label %cond_next
cond_next:              ; preds = %entry, %cond_true
        %tmp6 = load double* @s.3060            ; <double> [#uses=1]

This implements Transforms/InstCombine/store-merge.ll:test2

llvm-svn: 36040

4a6e0cbd

refactor some code, no functionality change. · 14a251b9
Chris Lattner authored Apr 15, 2007
```
llvm-svn: 36037
```
14a251b9
fix long lines · 28d921d0
Chris Lattner authored Apr 14, 2007
```
llvm-svn: 36031
```
28d921d0

Implement Transforms/InstCombine/vec_extract_elt.ll, transforming: · 7bfdd0ab

Chris Lattner authored Apr 14, 2007

define i32 @test(float %f) {
        %tmp7 = insertelement <4 x float> undef, float %f, i32 0
        %tmp17 = bitcast <4 x float> %tmp7 to <4 x i32>
        %tmp19 = extractelement <4 x i32> %tmp17, i32 0
        ret i32 %tmp19
}

into:

define i32 @test(float %f) {
        %tmp19 = bitcast float %f to i32                ; <i32> [#uses=1]
        ret i32 %tmp19
}

On PPC, this is the difference between:

_test:
        mfspr r2, 256
        oris r3, r2, 8192
        mtspr 256, r3
        stfs f1, -16(r1)
        addi r3, r1, -16
        addi r4, r1, -32
        lvx v2, 0, r3
        stvx v2, 0, r4
        lwz r3, -32(r1)
        mtspr 256, r2
        blr

and:

_test:
        stfs f1, -4(r1)
        nop
        nop
        nop
        lwz r3, -4(r1)
        blr

llvm-svn: 36025

7bfdd0ab

Implement InstCombine/vec_demanded_elts.ll:test2. This allows us to turn · b37fb6a0

Chris Lattner authored Apr 14, 2007

unsigned test(float f) {
 return _mm_cvtsi128_si32( (__m128i) _mm_set_ss( f*f ));
}

into:

_test:
        movss 4(%esp), %xmm0
        mulss %xmm0, %xmm0
        movd %xmm0, %eax
        ret

instead of:

_test:
        movss 4(%esp), %xmm0
        mulss %xmm0, %xmm0
        xorps %xmm1, %xmm1
        movss %xmm0, %xmm1
        movd %xmm1, %eax
        ret

GCC gets:

_test:
        subl    $28, %esp
        movss   32(%esp), %xmm0
        mulss   %xmm0, %xmm0
        xorps   %xmm1, %xmm1
        movss   %xmm0, %xmm1
        movaps  %xmm1, %xmm0
        movd    %xmm0, 12(%esp)
        movl    12(%esp), %eax
        addl    $28, %esp
        ret

llvm-svn: 36020

b37fb6a0

Apr 14, 2007
- Implement PR1201 and test/Transforms/InstCombine/malloc-free-delete.ll · efb33d28
  Chris Lattner authored Apr 14, 2007
```
llvm-svn: 35981
```
  efb33d28
- use an accessor to simplify code. · 164b7656
  Chris Lattner authored Apr 14, 2007
```
llvm-svn: 35979
```
  164b7656
Apr 13, 2007

Now that codegen prepare isn't defeating me, I can finally fix what I set · efd3051d

Chris Lattner authored Apr 13, 2007

out to do! :)

This fixes a problem where LSR would insert a bunch of code into each MBB
that uses a particular subexpression (e.g. IV+base+C).  The problem is that
this code cannot be CSE'd back together if inserted into different blocks.

This patch changes LSR to attempt to insert a single copy of this code and
share it, allowing codegenprepare to duplicate the code if it can be sunk
into various addressing modes.  On CodeGen/ARM/lsr-code-insertion.ll,
for example, this gives us code like:

        add r8, r0, r5
        str r6, [r8, #+4]
..
        ble LBB1_4      @cond_next
LBB1_3: @cond_true
        str r10, [r8, #+4]
LBB1_4: @cond_next
...
LBB1_5: @cond_true55
        ldr r6, LCPI1_1
        str r6, [r8, #+4]

instead of:

        add r10, r0, r6
        str r8, [r10, #+4]
...
        ble LBB1_4      @cond_next
LBB1_3: @cond_true
        add r8, r0, r6
        str r10, [r8, #+4]
LBB1_4: @cond_next
...
LBB1_5: @cond_true55
        add r8, r0, r6
        ldr r10, LCPI1_1
        str r10, [r8, #+4]

Besides being smaller and more efficient, this makes it immediately
obvious that it is profitable to predicate LBB1_3 now :)

llvm-svn: 35972

efd3051d

Completely rewrite addressing-mode related sinking of code. In particular, · feee64e9

Chris Lattner authored Apr 13, 2007

this fixes problems where codegenprepare would sink expressions into load/stores
that are not valid, and fixes cases where it would miss important valid ones.

This fixes several serious codesize and perf issues, particularly on targets
with complex addressing modes like arm and x86.  For example, now we compile
CodeGen/X86/isel-sink.ll to:

_test:
        movl 8(%esp), %eax
        movl 4(%esp), %ecx
        cmpl $1233, %eax
        ja LBB1_2       #F
LBB1_1: #T
        movl $4, (%ecx,%eax,4)
        movl $141, %eax
        ret
LBB1_2: #F
        movl (%ecx,%eax,4), %eax
        ret

instead of:

_test:
        movl 8(%esp), %eax
        leal (,%eax,4), %ecx
        addl 4(%esp), %ecx
        cmpl $1233, %eax
        ja LBB1_2       #F
LBB1_1: #T
        movl $4, (%ecx)
        movl $141, %eax
        ret
LBB1_2: #F
        movl (%ecx), %eax
        ret

llvm-svn: 35970

feee64e9

Apr 11, 2007
- Fix Transforms/ScalarRepl/union-pointer.ll · 5ee4d072
  Chris Lattner authored Apr 11, 2007
```
llvm-svn: 35906
```
  5ee4d072
- Turn stuff like: · 74ff60ff
  Chris Lattner authored Apr 11, 2007
```
        icmp slt i32 %X, 0              ; <i1>:0 [#uses=1]
        sext i1 %0 to i32               ; <i32>:1 [#uses=1]

into:

        %X.lobit = ashr i32 %X, 31              ; <i32> [#uses=1]

This implements InstCombine/icmp.ll:test[34]

llvm-svn: 35891
```
  74ff60ff
- Simplify some comparisons to arithmetic, this implements: · d0f7942e
  Chris Lattner authored Apr 11, 2007
```
Transforms/InstCombine/icmp.ll

llvm-svn: 35890
```
  d0f7942e
- canonicalize (x <u 2147483648) -> (x >s -1) and (x >u 2147483647) -> (x <s 0) · 20f2372a
  Chris Lattner authored Apr 11, 2007
```
llvm-svn: 35886
```
  20f2372a