Commits · 9232c8c1c5642824fbf3f323c7c2fa204426f5e2 · Roger Ferrer / llvm-epi-0.8

Apr 18, 2006

Add a warning. · 9232c8c1
Chris Lattner authored Apr 18, 2006
```
llvm-svn: 27795
```
9232c8c1
Add a warning · 3af67456
Chris Lattner authored Apr 18, 2006
```
llvm-svn: 27794
```
3af67456
Use vmladduhm to do v8i16 multiplies which is faster and simpler than doing · 96d50487
Chris Lattner authored Apr 18, 2006
```
even/odd halves.  Thanks to Nate telling me what's what.

llvm-svn: 27793
```
96d50487

Implement v16i8 multiply with this code: · d6d82aa8

Chris Lattner authored Apr 18, 2006

        vmuloub v5, v3, v2
        vmuleub v2, v3, v2
        vperm v2, v2, v5, v4

This implements CodeGen/PowerPC/vec_mul.ll.  With this, v16i8 multiplies are
6.79x faster than before.

Overall, UnitTests/Vector/multiplies.c is now 2.45x faster with LLVM than with
GCC.

Remove the 'integer multiplies' todo from the README file.

llvm-svn: 27792

d6d82aa8

Add tests for v8i16 and v16i8 · 48786e48
Chris Lattner authored Apr 18, 2006
```
llvm-svn: 27791
```
48786e48
Correct comments · 4d36a369
Evan Cheng authored Apr 18, 2006
```
llvm-svn: 27790
```
4d36a369

Lower v8i16 multiply into this code: · 7e439874

Chris Lattner authored Apr 18, 2006

        li r5, lo16(LCPI1_0)
        lis r6, ha16(LCPI1_0)
        lvx v4, r6, r5
        vmulouh v5, v3, v2
        vmuleuh v2, v3, v2
        vperm v2, v2, v5, v4

where v4 is:
LCPI1_0:                                        ;  <16 x ubyte>
        .byte   2
        .byte   3
        .byte   18
        .byte   19
        .byte   6
        .byte   7
        .byte   22
        .byte   23
        .byte   10
        .byte   11
        .byte   26
        .byte   27
        .byte   14
        .byte   15
        .byte   30
        .byte   31

This is 5.07x faster on the G5 (measured) than lowering to scalar code +
loads/stores.

llvm-svn: 27789

7e439874

Custom lower v4i32 multiplies into a cute sequence, instead of having legalize · a2cae1bb

Chris Lattner authored Apr 18, 2006

scalarize the sequence into 4 mullw's and a bunch of load/store traffic.

This speeds up v4i32 multiplies 4.1x (measured) on a G5.  This implements
PowerPC/vec_mul.ll

llvm-svn: 27788

a2cae1bb

new testcase · 2dea1540
Chris Lattner authored Apr 18, 2006
```
llvm-svn: 27787
```
2dea1540
Another entry · 0ef23350
Evan Cheng authored Apr 18, 2006
```
llvm-svn: 27786
```
0ef23350
Fix a build failure on Vladimir's tester. · 3db20563
Chris Lattner authored Apr 18, 2006
```
llvm-svn: 27785
```
3db20563
Another entry. · e008bd3d
Evan Cheng authored Apr 18, 2006
```
llvm-svn: 27784
```
e008bd3d
Use movss to insert_vector_elt(v, s, 0). · 5421206c
Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27782
```
5421206c
Turn x86 unaligned load/store intrinsics into aligned load/store instructions · 36dd7c98
Chris Lattner authored Apr 17, 2006
```
if the pointer is known aligned.

llvm-svn: 27781
```
36dd7c98
Fix handling of calls in functions that use vectors. This fixes a crash on · 916ae077
Chris Lattner authored Apr 17, 2006
```
the code in GCC PR26546.

llvm-svn: 27780
```
916ae077
Use two pinsrw to insert an element into v4i32 / v4f32 vector. · 6e5e2058
Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27779
```
6e5e2058

Apr 17, 2006

remove done item · 63a5cdc4
Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27778
```
63a5cdc4

Don't diddle VRSAVE if no registers need to be added/removed from it. This · 6bd68ae8

Chris Lattner authored Apr 17, 2006

allows us to codegen functions as:

_test_rol:
        vspltisw v2, -12
        vrlw v2, v2, v2
        blr

instead of:

_test_rol:
        mfvrsave r2, 256
        mr r3, r2
        mtvrsave r3
        vspltisw v2, -12
        vrlw v2, v2, v2
        mtvrsave r2
        blr

Testcase here: CodeGen/PowerPC/vec_vrsave.ll

llvm-svn: 27777

6bd68ae8

New testcase, shouldn't touch vrsave · efe2b3f2
Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27776
```
efe2b3f2
Add a MachineInstr::eraseFromParent convenience method. · bec79b4a
Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27775
```
bec79b4a
Add some convenience methods. · 9fcad09b
Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27774
```
9fcad09b
Encoding bug · 22c06f05
Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27773
```
22c06f05

Vectors that are known live-in and live-out are clearly already marked in · 72d7c270

Chris Lattner authored Apr 17, 2006

the vrsave register for the caller.  This allows us to codegen a function as:

_test_rol:
        mfspr r2, 256
        mr r3, r2
        mtspr 256, r3
        vspltisw v2, -12
        vrlw v2, v2, v2
        mtspr 256, r2
        blr

instead of:

_test_rol:
        mfspr r2, 256
        oris r3, r2, 40960
        mtspr 256, r3
        vspltisw v0, -12
        vrlw v2, v0, v0
        mtspr 256, r2
        blr

llvm-svn: 27772

72d7c270

Prefer to allocate V2-V5 before V0,V1. This lets us generate code like this: · 14c4972b

Chris Lattner authored Apr 17, 2006

        vspltisw v2, -12
        vrlw v2, v2, v2

instead of:

        vspltisw v0, -12
        vrlw v2, v0, v0

when a function is returning a value.

llvm-svn: 27771

14c4972b

Move some knowledge about registers out of the code emitter into the register info. · 6df094b4
Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27770
```
6df094b4
Use a small table instead of macros to do this conversion. · 0f28d48d
Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27769
```
0f28d48d
Implement v8i16, v16i8 splat using unpckl + pshufd. · 5022b342
Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27768
```
5022b342
implement returns of a vector, testcase here: CodeGen/X86/vec_return.ll · c070c621
Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27767
```
c070c621
New testcase · e757ae65
Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27766
```
e757ae65

Codegen insertelement with constant insertion points as scalar_to_vector · 326870b4

Chris Lattner authored Apr 17, 2006

and a shuffle.  For this:

void %test2(<4 x float>* %F, float %f) {
        %tmp = load <4 x float>* %F             ; <<4 x float>> [#uses=2]
        %tmp3 = add <4 x float> %tmp, %tmp              ; <<4 x float>> [#uses=1]
        %tmp2 = insertelement <4 x float> %tmp3, float %f, uint 2               ; <<4 x float>> [#uses=2]
        %tmp6 = add <4 x float> %tmp2, %tmp2            ; <<4 x float>> [#uses=1]
        store <4 x float> %tmp6, <4 x float>* %F
        ret void
}

we now get this on X86 (which will get better):

_test2:
        movl 4(%esp), %eax
        movaps (%eax), %xmm0
        addps %xmm0, %xmm0
        movaps %xmm0, %xmm1
        shufps $3, %xmm1, %xmm1
        movaps %xmm0, %xmm2
        shufps $1, %xmm2, %xmm2
        unpcklps %xmm1, %xmm2
        movss 8(%esp), %xmm1
        unpcklps %xmm1, %xmm0
        unpcklps %xmm2, %xmm0
        addps %xmm0, %xmm0
        movaps %xmm0, (%eax)
        ret

instead of:

_test2:
        subl $28, %esp
        movl 32(%esp), %eax
        movaps (%eax), %xmm0
        addps %xmm0, %xmm0
        movaps %xmm0, (%esp)
        movss 36(%esp), %xmm0
        movss %xmm0, 8(%esp)
        movaps (%esp), %xmm0
        addps %xmm0, %xmm0
        movaps %xmm0, (%eax)
        addl $28, %esp
        ret

llvm-svn: 27765

326870b4

Make sure to check splats of every constant we can, handle splat(31) by · e54133cf
Chris Lattner authored Apr 17, 2006
```
being a bit more clever, add support for odd splats from -31 to -17.

llvm-svn: 27764
```
e54133cf
Incorrect foldMemoryOperand entries · bf0d13c5
Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27763
```
bf0d13c5
Errors in patterns preventing load folding · 5112b5c5
Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27762
```
5112b5c5
Add checks for __OpenBSD__. · e3955a05
Jeff Cohen authored Apr 17, 2006
```
llvm-svn: 27761
```
e3955a05
Teach the ppc backend to use rol and vsldoi to generate splatted constants. · 264c908e
Chris Lattner authored Apr 17, 2006
```
This implements vec_constants.ll:test_vsldoi and test_rol

llvm-svn: 27760
```
264c908e
Some more cases that can be generated with two instructions · 8cdba16d
Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27759
```
8cdba16d
add a note · 26fb8d93
Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27758
```
26fb8d93
FP SETOLT, SETOLT, SETUGE, SETUGT conditions were implemented incorrectly · b3b41c4f
Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27755
```
b3b41c4f
Make some code more general, adding support for constant formation of several · 1b3806ac
Chris Lattner authored Apr 17, 2006
```
new patterns.

llvm-svn: 27754
```
1b3806ac
New testcases · 9a3859b3
Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27753
```
9a3859b3