Commits · 9754d142a49a20017413b3e863baa5e1040b648e · Roger Ferrer / llvm-epi-0.8

Apr 18, 2006

Implement an important entry from README_ALTIVEC: · 9754d142

Chris Lattner authored Apr 18, 2006

If an altivec predicate compare is used immediately by a branch, don't
use a (serializing) MFCR instruction to read the CR6 register, which requires
a compare to get it back to CR's.  Instead, just branch on CR6 directly. :)

For example, for:
void foo2(vector float *A, vector float *B) {
  if (!vec_any_eq(*A, *B))
    *B = (vector float){0,0,0,0};
}

We now generate:

_foo2:
        mfspr r2, 256
        oris r5, r2, 12288
        mtspr 256, r5
        lvx v2, 0, r4
        lvx v3, 0, r3
        vcmpeqfp. v2, v3, v2
        bne cr6, LBB1_2 ; UnifiedReturnBlock
LBB1_1: ; cond_true
        vxor v2, v2, v2
        stvx v2, 0, r4
        mtspr 256, r2
        blr
LBB1_2: ; UnifiedReturnBlock
        mtspr 256, r2
        blr

instead of:

_foo2:
        mfspr r2, 256
        oris r5, r2, 12288
        mtspr 256, r5
        lvx v2, 0, r4
        lvx v3, 0, r3
        vcmpeqfp. v2, v3, v2
        mfcr r3, 2
        rlwinm r3, r3, 27, 31, 31
        cmpwi cr0, r3, 0
        beq cr0, LBB1_2 ; UnifiedReturnBlock
LBB1_1: ; cond_true
        vxor v2, v2, v2
        stvx v2, 0, r4
        mtspr 256, r2
        blr
LBB1_2: ; UnifiedReturnBlock
        mtspr 256, r2
        blr

This implements CodeGen/PowerPC/vec_br_cmp.ll.

llvm-svn: 27804

9754d142

move some stuff around, clean things up · 68c16a20
Chris Lattner authored Apr 18, 2006
```
llvm-svn: 27802
```
68c16a20
Use vmladduhm to do v8i16 multiplies which is faster and simpler than doing · 96d50487
Chris Lattner authored Apr 18, 2006
```
even/odd halves.  Thanks to Nate telling me what's what.

llvm-svn: 27793
```
96d50487

Implement v16i8 multiply with this code: · d6d82aa8

Chris Lattner authored Apr 18, 2006

        vmuloub v5, v3, v2
        vmuleub v2, v3, v2
        vperm v2, v2, v5, v4

This implements CodeGen/PowerPC/vec_mul.ll.  With this, v16i8 multiplies are
6.79x faster than before.

Overall, UnitTests/Vector/multiplies.c is now 2.45x faster with LLVM than with
GCC.

Remove the 'integer multiplies' todo from the README file.

llvm-svn: 27792

d6d82aa8

Lower v8i16 multiply into this code: · 7e439874

Chris Lattner authored Apr 18, 2006

        li r5, lo16(LCPI1_0)
        lis r6, ha16(LCPI1_0)
        lvx v4, r6, r5
        vmulouh v5, v3, v2
        vmuleuh v2, v3, v2
        vperm v2, v2, v5, v4

where v4 is:
LCPI1_0:                                        ;  <16 x ubyte>
        .byte   2
        .byte   3
        .byte   18
        .byte   19
        .byte   6
        .byte   7
        .byte   22
        .byte   23
        .byte   10
        .byte   11
        .byte   26
        .byte   27
        .byte   14
        .byte   15
        .byte   30
        .byte   31

This is 5.07x faster on the G5 (measured) than lowering to scalar code +
loads/stores.

llvm-svn: 27789

7e439874

Custom lower v4i32 multiplies into a cute sequence, instead of having legalize · a2cae1bb

Chris Lattner authored Apr 18, 2006

scalarize the sequence into 4 mullw's and a bunch of load/store traffic.

This speeds up v4i32 multiplies 4.1x (measured) on a G5.  This implements
PowerPC/vec_mul.ll

llvm-svn: 27788

a2cae1bb

Apr 17, 2006
- remove done item · 63a5cdc4
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27778
```
  63a5cdc4
- Don't diddle VRSAVE if no registers need to be added/removed from it. This · 6bd68ae8
  Chris Lattner authored Apr 17, 2006
```
allows us to codegen functions as:

_test_rol:
        vspltisw v2, -12
        vrlw v2, v2, v2
        blr

instead of:

_test_rol:
        mfvrsave r2, 256
        mr r3, r2
        mtvrsave r3
        vspltisw v2, -12
        vrlw v2, v2, v2
        mtvrsave r2
        blr

Testcase here: CodeGen/PowerPC/vec_vrsave.ll

llvm-svn: 27777
```
  6bd68ae8
- Vectors that are known live-in and live-out are clearly already marked in · 72d7c270
  Chris Lattner authored Apr 17, 2006
```
the vrsave register for the caller.  This allows us to codegen a function as:

_test_rol:
        mfspr r2, 256
        mr r3, r2
        mtspr 256, r3
        vspltisw v2, -12
        vrlw v2, v2, v2
        mtspr 256, r2
        blr

instead of:

_test_rol:
        mfspr r2, 256
        oris r3, r2, 40960
        mtspr 256, r3
        vspltisw v0, -12
        vrlw v2, v0, v0
        mtspr 256, r2
        blr

llvm-svn: 27772
```
  72d7c270
- Prefer to allocate V2-V5 before V0,V1. This lets us generate code like this: · 14c4972b
  Chris Lattner authored Apr 17, 2006
```
        vspltisw v2, -12
        vrlw v2, v2, v2

instead of:

        vspltisw v0, -12
        vrlw v2, v0, v0

when a function is returning a value.

llvm-svn: 27771
```
  14c4972b
- Move some knowledge about registers out of the code emitter into the register info. · 6df094b4
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27770
```
  6df094b4
- Use a small table instead of macros to do this conversion. · 0f28d48d
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27769
```
  0f28d48d
- Make sure to check splats of every constant we can, handle splat(31) by · e54133cf
  Chris Lattner authored Apr 17, 2006
```
being a bit more clever, add support for odd splats from -31 to -17.

llvm-svn: 27764
```
  e54133cf
- Teach the ppc backend to use rol and vsldoi to generate splatted constants. · 264c908e
  Chris Lattner authored Apr 17, 2006
```
This implements vec_constants.ll:test_vsldoi and test_rol

llvm-svn: 27760
```
  264c908e
- add a note · 26fb8d93
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27758
```
  26fb8d93
- Make some code more general, adding support for constant formation of several · 1b3806ac
  Chris Lattner authored Apr 17, 2006
```
new patterns.

llvm-svn: 27754
```
  1b3806ac
- Learn how to make odd splatted constants in range [17,29]. This implements · f8dd76df
  Chris Lattner authored Apr 17, 2006
```
PowerPC/vec_constants.ll:test_29.

llvm-svn: 27752
```
  f8dd76df
- Pull some code out into a helper function. · 2a099c04
  Chris Lattner authored Apr 17, 2006
```
Effeciently codegen even splats in the range [-32,30].

This allows us to codegen <30,30,30,30> as:

        vspltisw v0, 15
        vadduwm v2, v0, v0

instead of as a cp load.

llvm-svn: 27750
```
  2a099c04
- Implement a TODO: for any shuffle that can be viewed as a v4[if]32 shuffle, · 071ad01c
  Chris Lattner authored Apr 17, 2006
```
if it can be implemented in 3 or fewer discrete altivec instructions, codegen
it as such.  This implements Regression/CodeGen/PowerPC/vec_perf_shuffle.ll

llvm-svn: 27748
```
  071ad01c
- Regenerate with adjusted costs · 85bfa3c2
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27746
```
  85bfa3c2
- Regenerate with correct offset · aac2a200
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27744
```
  aac2a200
- Increase the opcodes by one each to disambiguate COPY from VMRGHW. · 311b1a6e
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27742
```
  311b1a6e
- Check in a table, generated by llvm-PerfectShuffle, of optimal shuffles · 07a3d01a
  Chris Lattner authored Apr 17, 2006
```
of various 4-element vectors.

llvm-svn: 27739
```
  07a3d01a
Apr 16, 2006
- Implement a TODO: have the legalizer canonicalize a bunch of operations to · 06a21ba9
  Chris Lattner authored Apr 16, 2006
```
one type (v4i32) so that we don't have to write patterns for each type, and
so that more CSE opportunities are exposed.

llvm-svn: 27731
```
  06a21ba9
- Make the BUILD_VECTOR lowering code much more aggressive w.r.t constant vectors. · fa5aa396
  Chris Lattner authored Apr 16, 2006
```
Remove some done items from the todo list.

llvm-svn: 27729
```
  fa5aa396
- Fix a crash when faced with a shuffle vector that has an undef in its mask. · 24acbe46
  Chris Lattner authored Apr 15, 2006
```
llvm-svn: 27726
```
  24acbe46
- Add patterns for matching vnots with bit converted inputs. Most of these will · 873202fa
  Chris Lattner authored Apr 15, 2006
```
go away when I start using evan's binop type canonicalizer

llvm-svn: 27725
```
  873202fa
Apr 15, 2006
- Allow undef in a shuffle mask · 559c8ba4
  Chris Lattner authored Apr 14, 2006
```
llvm-svn: 27714
```
  559c8ba4
Apr 14, 2006
- Move the rest of the PPCTargetLowering::LowerOperation cases out into · 4211ca91
  Chris Lattner authored Apr 14, 2006
```
separate functions, for simplicity and code clarity.

llvm-svn: 27693
```
  4211ca91
- Pull the VECTOR_SHUFFLE and BUILD_VECTOR lowering code out into separate · 19e9055e
  Chris Lattner authored Apr 14, 2006
```
functions, which makes the code much cleaner :)

llvm-svn: 27692
```
  19e9055e
Apr 13, 2006
- Force non-darwin targets to use a static relo model. This fixes PR734, · 883fb053
  Chris Lattner authored Apr 13, 2006
```
tested by CodeGen/Generic/vector.ll

llvm-svn: 27657
```
  883fb053
- add a note, move an altivec todo to the altivec list. · 5879efe0
  Chris Lattner authored Apr 13, 2006
```
llvm-svn: 27654
```
  5879efe0
- Add the README files to the distribution. · 9857229a
  Reid Spencer authored Apr 13, 2006
```
llvm-svn: 27651
```
  9857229a
Apr 12, 2006

Add a new way to match vector constants, which make it easier to bang bits of · 147e50e1

Chris Lattner authored Apr 12, 2006

different types.

Codegen spltw(0x7FFFFFFF) and spltw(0x80000000) without a constant pool load,
implementing PowerPC/vec_constants.ll:test1.  This compiles:

typedef float vf __attribute__ ((vector_size (16)));
typedef int vi __attribute__ ((vector_size (16)));
void test(vi *P1, vi *P2, vf *P3) {
  *P1 &= (vi){0x80000000,0x80000000,0x80000000,0x80000000};
  *P2 &= (vi){0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF};
  *P3 = vec_abs((vector float)*P3);
}

to:

_test:
        mfspr r2, 256
        oris r6, r2, 49152
        mtspr 256, r6
        vspltisw v0, -1
        vslw v0, v0, v0
        lvx v1, 0, r3
        vand v1, v1, v0
        stvx v1, 0, r3
        lvx v1, 0, r4
        vandc v1, v1, v0
        stvx v1, 0, r4
        lvx v1, 0, r5
        vandc v0, v1, v0
        stvx v0, 0, r5
        mtspr 256, r2
        blr

instead of (with two constant pool entries):

_test:
        mfspr r2, 256
        oris r6, r2, 49152
        mtspr 256, r6
        li r6, lo16(LCPI1_0)
        lis r7, ha16(LCPI1_0)
        li r8, lo16(LCPI1_1)
        lis r9, ha16(LCPI1_1)
        lvx v0, r7, r6
        lvx v1, 0, r3
        vand v0, v1, v0
        stvx v0, 0, r3
        lvx v0, r9, r8
        lvx v1, 0, r4
        vand v1, v1, v0
        stvx v1, 0, r4
        lvx v1, 0, r5
        vand v0, v1, v0
        stvx v0, 0, r5
        mtspr 256, r2
        blr

GCC produces (with 2 cp entries):

_test:
        mfspr r0,256
        stw r0,-4(r1)
        oris r0,r0,0xc00c
        mtspr 256,r0
        lis r2,ha16(LC0)
        lis r9,ha16(LC1)
        la r2,lo16(LC0)(r2)
        lvx v0,0,r3
        lvx v1,0,r5
        la r9,lo16(LC1)(r9)
        lwz r12,-4(r1)
        lvx v12,0,r2
        lvx v13,0,r9
        vand v0,v0,v12
        stvx v0,0,r3
        vspltisw v0,-1
        vslw v12,v0,v0
        vandc v1,v1,v12
        stvx v1,0,r5
        lvx v0,0,r4
        vand v0,v0,v13
        stvx v0,0,r4
        mtspr 256,r12
        blr

llvm-svn: 27624

147e50e1

Rename get_VSPLI_elt -> get_VSPLTI_elt · 74cf9ff7

Chris Lattner authored Apr 12, 2006

Canonicalize BUILD_VECTOR's that match VSPLTI's into a single type for each
form, eliminating a bunch of Pat patterns in the .td file and allowing us to
CSE stuff more aggressively.  This implements
PowerPC/buildvec_canonicalize.ll:VSPLTI

llvm-svn: 27614

74cf9ff7

Ensure that zero vectors are always v4i32, which forces them to CSE with · e318a757
Chris Lattner authored Apr 12, 2006
```
each other.  This implements CodeGen/PowerPC/vxor-canonicalize.ll

llvm-svn: 27609
```
e318a757

Apr 11, 2006
- Fix SingleSource/UnitTests/Vector/sumarray-dbl · f19bcd51
  Nate Begeman authored Apr 11, 2006
```
llvm-svn: 27594
```
  f19bcd51
- Fix PR727, correctly handling large stack aligments on ppc · 1bb13209
  Nate Begeman authored Apr 11, 2006
```
llvm-svn: 27593
```
  1bb13209
- we have a shuffle instr, add an example. · aaa04230
  Chris Lattner authored Apr 11, 2006
```
llvm-svn: 27592
```
  aaa04230
- Suppress debug label when not debug. · 02b3b72b
  Jim Laskey authored Apr 11, 2006
```
llvm-svn: 27588
```
  02b3b72b