Commits · bd77fac034cb6ec4bf30d5e1067da29f610c9629 · Roger Ferrer / llvm-epi-0.8

Oct 24, 2005
- Make sure that anything using the ADCE pass pulls in the UnifyFunctionExitNodes · bd77fac0
  Chris Lattner authored Oct 24, 2005
```
code

llvm-svn: 23931
```
  bd77fac0
Oct 23, 2005

When a function takes a variable number of pointer arguments, with a zero · 11e26b52

Jeff Cohen authored Oct 23, 2005

pointer marking the end of the list, the zero *must* be cast to the pointer
type.  An un-cast zero is a 32-bit int, and at least on x86_64, gcc will
not extend the zero to 64 bits, thus allowing the upper 32 bits to be
random junk.

The new END_WITH_NULL macro may be used to annotate a such a function
so that GCC (version 4 or newer) will detect the use of un-casted zero
at compile time.

llvm-svn: 23888

11e26b52

Oct 21, 2005
- My previous patch was too conservative. Reject FP and void types, but do · 5df0e36e
  Chris Lattner authored Oct 21, 2005
```
allow pointer types.

llvm-svn: 23859
```
  5df0e36e
Oct 20, 2005

Do NOT touch FP ops with LSR. This fixes a testcase Nate sent me from an · 0c0b38bb

Chris Lattner authored Oct 20, 2005

inner loop like this:

LBB_RateConvertMono8AltiVec_2:  ; no_exit
        lis r2, ha16(.CPI_RateConvertMono8AltiVec_0)
        lfs f3, lo16(.CPI_RateConvertMono8AltiVec_0)(r2)
        fmr f3, f3
        fadd f0, f2, f0
        fadd f3, f0, f3
        fcmpu cr0, f3, f1
        bge cr0, LBB_RateConvertMono8AltiVec_2  ; no_exit

to an inner loop like this:

LBB_RateConvertMono8AltiVec_1:  ; no_exit
        fsub f2, f2, f1
        fcmpu cr0, f2, f1
        fmr f0, f2
        bge cr0, LBB_RateConvertMono8AltiVec_1  ; no_exit

Doh! good catch!

llvm-svn: 23838

0c0b38bb

Oct 17, 2005
- Make this work for FP constantexprs · da1b152c
  Chris Lattner authored Oct 17, 2005
```
llvm-svn: 23773
```
  da1b152c
- Oops, X+0.0 isn't foldable, but X+-0.0 is. · 7fde91e3
  Chris Lattner authored Oct 17, 2005
```
llvm-svn: 23772
```
  7fde91e3
- relax this a bit, as we only support the default rounding mode · 32979336
  Chris Lattner authored Oct 17, 2005
```
llvm-svn: 23771
```
  32979336
Oct 11, 2005
- Fix (hopefully the last) issue where LSR is nondeterminstic. When pulling · 192cd18f
  Chris Lattner authored Oct 11, 2005
```
out CSE's of base expressions it could build a result whose order was
nondet.

llvm-svn: 23698
```
  192cd18f
- Fix another problem where LSR was being nondeterminstic. Also remove elements · 5c9d63da
  Chris Lattner authored Oct 11, 2005
```
from the end of a vector instead of the beginning

llvm-svn: 23697
```
  5c9d63da
- Fix another lsr-is-nondeterministic case · b7a3894e
  Chris Lattner authored Oct 11, 2005
```
llvm-svn: 23695
```
  b7a3894e
Oct 10, 2005
- Make MaskedValueIsZero a bit more aggressive · 03b9eb50
  Chris Lattner authored Oct 09, 2005
```
llvm-svn: 23677
```
  03b9eb50
Oct 09, 2005
- Fix funky xcode indentation · 62010c45
  Chris Lattner authored Oct 09, 2005
```
llvm-svn: 23674
```
  62010c45
- Hrm, you didn't see this. · eb4be8b9
  Chris Lattner authored Oct 09, 2005
```
llvm-svn: 23673
```
  eb4be8b9
- Fix a source of non-determinism in the backend: the order of processing · 4ea0a3ea
  Chris Lattner authored Oct 09, 2005
```
IV strides dependend on the pointer order of the strides in memory.
Non-determinism is bad.

llvm-svn: 23672
```
  4ea0a3ea
Oct 07, 2005
- Remove useless variable. · 572910c9
  Jeff Cohen authored Oct 07, 2005
```
llvm-svn: 23656
```
  572910c9
Oct 03, 2005

Make IVUseShouldUsePostIncValue more aggressive when the use is a PHI. In · f07a587c

Chris Lattner authored Oct 03, 2005

particular, it should realize that phi's use their values in the pred block
not the phi block itself.  This change turns our em3d loop from this:

_test:
        cmpwi cr0, r4, 0
        bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
LBB_test_1:     ; entry.loopexit_crit_edge
        li r2, 0
        b LBB_test_6    ; loopexit
LBB_test_2:     ; entry.no_exit_crit_edge
        li r6, 0
LBB_test_3:     ; no_exit
        or r2, r6, r6
        lwz r6, 0(r3)
        cmpw cr0, r6, r5
        beq cr0, LBB_test_6     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r2, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit
LBB_test_5:     ; endif.loopexit.loopexit_crit_edge
        addi r3, r2, 1
        blr
LBB_test_6:     ; loopexit
        or r3, r2, r2
        blr

into:

_test:
        cmpwi cr0, r4, 0
        bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
LBB_test_1:     ; entry.loopexit_crit_edge
        li r2, 0
        b LBB_test_5    ; loopexit
LBB_test_2:     ; entry.no_exit_crit_edge
        li r6, 0
LBB_test_3:     ; no_exit
        lwz r2, 0(r3)
        cmpw cr0, r2, r5
        or r2, r6, r6
        beq cr0, LBB_test_5     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r6, 1
        cmpw cr0, r6, r4
        or r2, r6, r6
        blt cr0, LBB_test_3     ; no_exit
LBB_test_5:     ; loopexit
        or r3, r2, r2
        blr


Unfortunately, this is actually worse code, because the register coallescer
is getting confused somehow.  If it were doing its job right, it could turn the
code into this:

_test:
        cmpwi cr0, r4, 0
        bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
LBB_test_1:     ; entry.loopexit_crit_edge
        li r6, 0
        b LBB_test_5    ; loopexit
LBB_test_2:     ; entry.no_exit_crit_edge
        li r6, 0
LBB_test_3:     ; no_exit
        lwz r2, 0(r3)
        cmpw cr0, r2, r5
        beq cr0, LBB_test_5     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r6, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit
LBB_test_5:     ; loopexit
        or r3, r6, r6
        blr

... which I'll work on next. :)

llvm-svn: 23604

f07a587c

Refactor some code into a function · e4ed42a4
Chris Lattner authored Oct 03, 2005
```
llvm-svn: 23603
```
e4ed42a4

This break is bogus and I have no idea why it was there. Basically it prevents · 360928db

Chris Lattner authored Oct 03, 2005

memoizing code when IV's are used by phinodes outside of loops.  In a simple
example, we were getting this code before (note that r6 and r7 are isomorphic
IV's):

        li r6, 0
        or r7, r6, r6
LBB_test_3:     ; no_exit
        lwz r2, 0(r3)
        cmpw cr0, r2, r5
        or r2, r7, r7
        beq cr0, LBB_test_5     ; loopexit
LBB_test_4:     ; endif
        addi r2, r7, 1
        addi r7, r7, 1
        addi r3, r3, 4
        addi r6, r6, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit

Now we get:

        li r6, 0
LBB_test_3:     ; no_exit
        or r2, r6, r6
        lwz r6, 0(r3)
        cmpw cr0, r6, r5
        beq cr0, LBB_test_6     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r2, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit

this was noticed in em3d.

llvm-svn: 23602

360928db

when checking if we should move a split edge block outside of a loop, · 8fcce170

Chris Lattner authored Oct 03, 2005

check the presplit pred, not the post-split pred.  This was causing us
to make the wrong decision in some cases, leaving the critical edge block
in the loop.

llvm-svn: 23601

8fcce170

Oct 01, 2005
- Fix VC++ warnings. · f8a5e5ae
  Jeff Cohen authored Oct 01, 2005
```
llvm-svn: 23579
```
  f8a5e5ae
Sep 29, 2005
- Insert stores after phi nodes in the normal dest. This fixes · a554c947
  Chris Lattner authored Sep 29, 2005
```
LowerInvoke/2005-08-03-InvokeWithPHI.ll

llvm-svn: 23525
```
  a554c947
Sep 28, 2005
- add a note about a way to improve this code further, that I won't be getting · 3b63bb37
  Chris Lattner authored Sep 27, 2005
```
to right now.

llvm-svn: 23485
```
  3b63bb37
Sep 27, 2005

Avoid spilling stack slots... to stack slots. · e285f5ed
Chris Lattner authored Sep 27, 2005
```
llvm-svn: 23478
```
e285f5ed

Completely rewrite 'correct' eh support. This changes how setjmp insertion · 87eb2493

Chris Lattner authored Sep 27, 2005

is performed so it is only at most once per function that contains an invoke
instead of once per invoke in the function.  This patch has the following perks:

1. It fixes PR631, which complains about slowness.
2. If fixes PR240, which complains about non-volatile vars being live across
   setjmp/longjmps.
3. It improves (but does not fix) the jmpbuf alignment issue on itanium by not
   forcing the jmpbufs to always be 8-bytes off the alignment of the structure.
4. It speeds up 253.perlbmk from 338s to 13.70s (a 25x improvement!), making us
   now about 4% faster than GCC.

Further improvements are also possible.

llvm-svn: 23477

87eb2493

Make the pass name simpler · 92233d21
Chris Lattner authored Sep 27, 2005
```
llvm-svn: 23476
```
92233d21

Sep 26, 2005
- Eliminate GetGEPGlobalInitializer in favor of the more powerful · 02ae21e1
  Chris Lattner authored Sep 26, 2005
```
ConstantFoldLoadThroughGEPConstantExpr function in the utils lib.

llvm-svn: 23446
```
  02ae21e1
- Factor the GetGEPGlobalInitializer out of this pass and into Transforms/Utils · 0b011ec8
  Chris Lattner authored Sep 26, 2005
```
as ConstantFoldLoadThroughGEPConstantExpr.

llvm-svn: 23445
```
  0b011ec8
Sep 25, 2005

Move MaskedValueIsZero up. · 0b3557f5

Chris Lattner authored Sep 24, 2005

Match a bunch of idioms for sign extensions, implementing InstCombine/signext.ll

llvm-svn: 23428

0b3557f5

Sep 18, 2005

Refactor this code a bit and make it more general. This now compiles: · b4b2530a

Chris Lattner authored Sep 18, 2005

struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus2 (unsigned int x) { b.j += x; }

To:

_plus2:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r4, 0(r2)
        slwi r3, r3, 6
        add r3, r4, r3
        rlwimi r3, r4, 0, 26, 14
        stw r3, 0(r2)
        blr


instead of:

_plus2:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r4, 0(r2)
        rlwinm r5, r4, 26, 21, 31
        add r3, r5, r3
        rlwimi r4, r3, 6, 15, 25
        stw r4, 0(r2)
        blr

by eliminating an 'and'.

I'm pretty sure this is as small as we can go :)

llvm-svn: 23386

b4b2530a

Compile · 797dee77

Chris Lattner authored Sep 18, 2005

struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus2 (unsigned int x) {
  b.j += x;
}

to:

plus2:
        mov %EAX, DWORD PTR [b]
        mov %ECX, %EAX
        and %ECX, 131008
        mov %EDX, DWORD PTR [%ESP + 4]
        shl %EDX, 6
        add %EDX, %ECX
        and %EDX, 131008
        and %EAX, -131009
        or %EDX, %EAX
        mov DWORD PTR [b], %EDX
        ret

instead of:

plus2:
        mov %EAX, DWORD PTR [b]
        mov %ECX, %EAX
        shr %ECX, 6
        and %ECX, 2047
        add %ECX, DWORD PTR [%ESP + 4]
        shl %ECX, 6
        and %ECX, 131008
        and %EAX, -131009
        or %ECX, %EAX
        mov DWORD PTR [b], %ECX
        ret

llvm-svn: 23385

797dee77

Generalize this transform, using MaskedValueIsZero, allowing us to compile: · 01f56c68

Chris Lattner authored Sep 18, 2005

struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus3 (unsigned int x) { b.k += x; }

To:

plus3:
        mov %EAX, DWORD PTR [%ESP + 4]
        shl %EAX, 17
        add DWORD PTR [b], %EAX
        ret

instead of:

plus3:
        mov %EAX, DWORD PTR [%ESP + 4]
        shl %EAX, 17
        mov %ECX, DWORD PTR [b]
        add %EAX, %ECX
        and %EAX, -131072
        and %ECX, 131071
        or %ECX, %EAX
        mov DWORD PTR [b], %ECX
        ret

llvm-svn: 23384

01f56c68

fix typeo · 4ebc8ab4
Chris Lattner authored Sep 18, 2005
```
llvm-svn: 23383
```
4ebc8ab4
Remove unintentionally committed code · e5b23a6d
Chris Lattner authored Sep 18, 2005
```
llvm-svn: 23382
```
e5b23a6d

implement shift.ll:test25. This compiles: · 27cb9dbd

Chris Lattner authored Sep 18, 2005

struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus3 (unsigned int x) {
  b.k += x;
}

to:

_plus3:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r3, 0(r2)
        rlwinm r4, r3, 0, 0, 14
        add r4, r4, r3
        rlwimi r4, r3, 0, 15, 31
        stw r4, 0(r2)
        blr

instead of:

_plus3:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r4, 0(r2)
        srwi r5, r4, 17
        add r3, r5, r3
        slwi r3, r3, 17
        rlwimi r3, r4, 0, 15, 31
        stw r3, 0(r2)
        blr

llvm-svn: 23381

27cb9dbd

Implement add.ll:test29. Codegening: · af517574

Chris Lattner authored Sep 18, 2005

struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus1 (unsigned int x) {
  b.i += x;
}

as:
_plus1:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r4, 0(r2)
        add r3, r4, r3
        rlwimi r3, r4, 0, 0, 25
        stw r3, 0(r2)
        blr

instead of:

_plus1:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r4, 0(r2)
        rlwinm r5, r4, 0, 26, 31
        add r3, r5, r3
        rlwimi r3, r4, 0, 0, 25
        stw r3, 0(r2)
        blr

llvm-svn: 23379

af517574

remove debug output · 027eaf01
Chris Lattner authored Sep 18, 2005
```
llvm-svn: 23377
```
027eaf01

Implement or.ll:test21. This teaches instcombine to be able to turn this: · 15212989

Chris Lattner authored Sep 18, 2005

struct {
   unsigned int bit0:1;
   unsigned int ubyte:31;
} sdata;

void foo() {
  sdata.ubyte++;
}

into this:

foo:
        add DWORD PTR [sdata], 2
        ret

instead of this:

foo:
        mov %EAX, DWORD PTR [sdata]
        mov %ECX, %EAX
        add %ECX, 2
        and %ECX, -2
        and %EAX, 1
        or %EAX, %ECX
        mov DWORD PTR [sdata], %EAX
        ret

llvm-svn: 23376

15212989

Sep 14, 2005
- Fix the regression last night compiling povray · a393e4d4
  Chris Lattner authored Sep 14, 2005
```
llvm-svn: 23348
```
  a393e4d4
Sep 13, 2005

Add a simple xform to simplify array accesses with casts in the way. · 2a893296

Chris Lattner authored Sep 13, 2005

This is useful for 178.galgel where resolution of dope vectors (by the
optimizer) causes the scales to become apparent.

llvm-svn: 23328

2a893296

Fix an issue where LSR would miss rewriting a use of an IV expression by a PHI... · fd018c8d

Chris Lattner authored Sep 13, 2005

Fix an issue where LSR would miss rewriting a use of an IV expression by a PHI node that is not the original PHI.

This fixes up a dot-product loop in galgel, speeding it up from 18.47s to
16.13s.

llvm-svn: 23327

fd018c8d