Commits · f776fc2c98ae5ea44af04c4c8ed131737b20d69c · Roger Ferrer / llvm-epi-0.8

Apr 18, 2006

Use vmladduhm to do v8i16 multiplies which is faster and simpler than doing · 96d50487
Chris Lattner authored Apr 18, 2006
```
even/odd halves.  Thanks to Nate telling me what's what.

llvm-svn: 27793
```
96d50487

Implement v16i8 multiply with this code: · d6d82aa8

Chris Lattner authored Apr 18, 2006

        vmuloub v5, v3, v2
        vmuleub v2, v3, v2
        vperm v2, v2, v5, v4

This implements CodeGen/PowerPC/vec_mul.ll.  With this, v16i8 multiplies are
6.79x faster than before.

Overall, UnitTests/Vector/multiplies.c is now 2.45x faster with LLVM than with
GCC.

Remove the 'integer multiplies' todo from the README file.

llvm-svn: 27792

d6d82aa8

Correct comments · 4d36a369
Evan Cheng authored Apr 18, 2006
```
llvm-svn: 27790
```
4d36a369

Lower v8i16 multiply into this code: · 7e439874

Chris Lattner authored Apr 18, 2006

        li r5, lo16(LCPI1_0)
        lis r6, ha16(LCPI1_0)
        lvx v4, r6, r5
        vmulouh v5, v3, v2
        vmuleuh v2, v3, v2
        vperm v2, v2, v5, v4

where v4 is:
LCPI1_0:                                        ;  <16 x ubyte>
        .byte   2
        .byte   3
        .byte   18
        .byte   19
        .byte   6
        .byte   7
        .byte   22
        .byte   23
        .byte   10
        .byte   11
        .byte   26
        .byte   27
        .byte   14
        .byte   15
        .byte   30
        .byte   31

This is 5.07x faster on the G5 (measured) than lowering to scalar code +
loads/stores.

llvm-svn: 27789

7e439874

Custom lower v4i32 multiplies into a cute sequence, instead of having legalize · a2cae1bb

Chris Lattner authored Apr 18, 2006

scalarize the sequence into 4 mullw's and a bunch of load/store traffic.

This speeds up v4i32 multiplies 4.1x (measured) on a G5.  This implements
PowerPC/vec_mul.ll

llvm-svn: 27788

a2cae1bb

Another entry · 0ef23350
Evan Cheng authored Apr 18, 2006
```
llvm-svn: 27786
```
0ef23350
Another entry. · e008bd3d
Evan Cheng authored Apr 18, 2006
```
llvm-svn: 27784
```
e008bd3d
Use movss to insert_vector_elt(v, s, 0). · 5421206c
Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27782
```
5421206c
Use two pinsrw to insert an element into v4i32 / v4f32 vector. · 6e5e2058
Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27779
```
6e5e2058

Apr 17, 2006
- remove done item · 63a5cdc4
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27778
```
  63a5cdc4
- Don't diddle VRSAVE if no registers need to be added/removed from it. This · 6bd68ae8
  Chris Lattner authored Apr 17, 2006
```
allows us to codegen functions as:

_test_rol:
        vspltisw v2, -12
        vrlw v2, v2, v2
        blr

instead of:

_test_rol:
        mfvrsave r2, 256
        mr r3, r2
        mtvrsave r3
        vspltisw v2, -12
        vrlw v2, v2, v2
        mtvrsave r2
        blr

Testcase here: CodeGen/PowerPC/vec_vrsave.ll

llvm-svn: 27777
```
  6bd68ae8
- Encoding bug · 22c06f05
  Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27773
```
  22c06f05
- Vectors that are known live-in and live-out are clearly already marked in · 72d7c270
  Chris Lattner authored Apr 17, 2006
```
the vrsave register for the caller.  This allows us to codegen a function as:

_test_rol:
        mfspr r2, 256
        mr r3, r2
        mtspr 256, r3
        vspltisw v2, -12
        vrlw v2, v2, v2
        mtspr 256, r2
        blr

instead of:

_test_rol:
        mfspr r2, 256
        oris r3, r2, 40960
        mtspr 256, r3
        vspltisw v0, -12
        vrlw v2, v0, v0
        mtspr 256, r2
        blr

llvm-svn: 27772
```
  72d7c270
- Prefer to allocate V2-V5 before V0,V1. This lets us generate code like this: · 14c4972b
  Chris Lattner authored Apr 17, 2006
```
        vspltisw v2, -12
        vrlw v2, v2, v2

instead of:

        vspltisw v0, -12
        vrlw v2, v0, v0

when a function is returning a value.

llvm-svn: 27771
```
  14c4972b
- Move some knowledge about registers out of the code emitter into the register info. · 6df094b4
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27770
```
  6df094b4
- Use a small table instead of macros to do this conversion. · 0f28d48d
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27769
```
  0f28d48d
- Implement v8i16, v16i8 splat using unpckl + pshufd. · 5022b342
  Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27768
```
  5022b342
- implement returns of a vector, testcase here: CodeGen/X86/vec_return.ll · c070c621
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27767
```
  c070c621
- Make sure to check splats of every constant we can, handle splat(31) by · e54133cf
  Chris Lattner authored Apr 17, 2006
```
being a bit more clever, add support for odd splats from -31 to -17.

llvm-svn: 27764
```
  e54133cf
- Incorrect foldMemoryOperand entries · bf0d13c5
  Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27763
```
  bf0d13c5
- Errors in patterns preventing load folding · 5112b5c5
  Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27762
```
  5112b5c5
- Add checks for __OpenBSD__. · e3955a05
  Jeff Cohen authored Apr 17, 2006
```
llvm-svn: 27761
```
  e3955a05
- Teach the ppc backend to use rol and vsldoi to generate splatted constants. · 264c908e
  Chris Lattner authored Apr 17, 2006
```
This implements vec_constants.ll:test_vsldoi and test_rol

llvm-svn: 27760
```
  264c908e
- add a note · 26fb8d93
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27758
```
  26fb8d93
- FP SETOLT, SETOLT, SETUGE, SETUGT conditions were implemented incorrectly · b3b41c4f
  Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27755
```
  b3b41c4f
- Make some code more general, adding support for constant formation of several · 1b3806ac
  Chris Lattner authored Apr 17, 2006
```
new patterns.

llvm-svn: 27754
```
  1b3806ac
- Learn how to make odd splatted constants in range [17,29]. This implements · f8dd76df
  Chris Lattner authored Apr 17, 2006
```
PowerPC/vec_constants.ll:test_29.

llvm-svn: 27752
```
  f8dd76df
- Pull some code out into a helper function. · 2a099c04
  Chris Lattner authored Apr 17, 2006
```
Effeciently codegen even splats in the range [-32,30].

This allows us to codegen <30,30,30,30> as:

        vspltisw v0, 15
        vadduwm v2, v0, v0

instead of as a cp load.

llvm-svn: 27750
```
  2a099c04
- Implement a TODO: for any shuffle that can be viewed as a v4[if]32 shuffle, · 071ad01c
  Chris Lattner authored Apr 17, 2006
```
if it can be implemented in 3 or fewer discrete altivec instructions, codegen
it as such.  This implements Regression/CodeGen/PowerPC/vec_perf_shuffle.ll

llvm-svn: 27748
```
  071ad01c
- Regenerate with adjusted costs · 85bfa3c2
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27746
```
  85bfa3c2
- Regenerate with correct offset · aac2a200
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27744
```
  aac2a200
- Increase the opcodes by one each to disambiguate COPY from VMRGHW. · 311b1a6e
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27742
```
  311b1a6e
- Check in a table, generated by llvm-PerfectShuffle, of optimal shuffles · 07a3d01a
  Chris Lattner authored Apr 17, 2006
```
of various 4-element vectors.

llvm-svn: 27739
```
  07a3d01a
Apr 16, 2006
- movduprm, movshduprm bugs · 20712dee
  Evan Cheng authored Apr 16, 2006
```
llvm-svn: 27734
```
  20712dee
- Encoding bugs · 3064f9aa
  Evan Cheng authored Apr 16, 2006
```
llvm-svn: 27733
```
  3064f9aa
- Can't fold loads into alias vector SSE ops used for scalar operation. The load · 685ddd81
  Evan Cheng authored Apr 16, 2006
```
address has to be 16-byte aligned but the values aren't spilled to 128-bit
locations.

llvm-svn: 27732
```
  685ddd81
- Implement a TODO: have the legalizer canonicalize a bunch of operations to · 06a21ba9
  Chris Lattner authored Apr 16, 2006
```
one type (v4i32) so that we don't have to write patterns for each type, and
so that more CSE opportunities are exposed.

llvm-svn: 27731
```
  06a21ba9
- Make the BUILD_VECTOR lowering code much more aggressive w.r.t constant vectors. · fa5aa396
  Chris Lattner authored Apr 16, 2006
```
Remove some done items from the todo list.

llvm-svn: 27729
```
  fa5aa396
- Fix a crash when faced with a shuffle vector that has an undef in its mask. · 24acbe46
  Chris Lattner authored Apr 15, 2006
```
llvm-svn: 27726
```
  24acbe46
- Add patterns for matching vnots with bit converted inputs. Most of these will · 873202fa
  Chris Lattner authored Apr 15, 2006
```
go away when I start using evan's binop type canonicalizer

llvm-svn: 27725
```
  873202fa