Commits · e728efdfce8bb978a576fa3ba9306d17d565b077 · Roger Ferrer / llvm-epi-0.8

Apr 22, 2006

Don't do all the lowering stuff for 2-wide build_vector's. Also, minor... · e728efdf

Evan Cheng authored Apr 22, 2006

Don't do all the lowering stuff for 2-wide build_vector's. Also, minor optimization for shuffle of undef.

llvm-svn: 27946

e728efdf

Fix a performance regression. Use {p}shuf* when there are only two distinct... · 16ef94f4
Evan Cheng authored Apr 22, 2006
```
Fix a performance regression. Use {p}shuf* when there are only two distinct elements in a build_vector.

llvm-svn: 27945
```
16ef94f4

Revamp build_vector lowering to take advantage of movss and movd instructions. · 14215c36

Evan Cheng authored Apr 21, 2006

movd always clear the top 96 bits and movss does so when it's loading the
value from memory.
The net result is codegen for 4-wide shuffles is much improved. It is near
optimal if one or more elements is a zero. e.g.

__m128i test(int a, int b) {
  return _mm_set_epi32(0, 0, b, a);
}

compiles to

_test:
	movd 8(%esp), %xmm1
	movd 4(%esp), %xmm0
	punpckldq %xmm1, %xmm0
	ret

compare to gcc:

_test:
	subl	$12, %esp
	movd	20(%esp), %xmm0
	movd	16(%esp), %xmm1
	punpckldq	%xmm0, %xmm1
	movq	%xmm1, %xmm0
	movhps	LC0, %xmm0
	addl	$12, %esp
	ret

or icc:

_test:
        movd      4(%esp), %xmm0                                #5.10
        movd      8(%esp), %xmm3                                #5.10
        xorl      %eax, %eax                                    #5.10
        movd      %eax, %xmm1                                   #5.10
        punpckldq %xmm1, %xmm0                                  #5.10
        movd      %eax, %xmm2                                   #5.10
        punpckldq %xmm2, %xmm3                                  #5.10
        punpckldq %xmm3, %xmm0                                  #5.10
        ret                                                     #5.10

There are still room for improvement, for example the FP variant of the above example:

__m128 test(float a, float b) {
  return _mm_set_ps(0.0, 0.0, b, a);
}

_test:
	movss 8(%esp), %xmm1
	movss 4(%esp), %xmm0
	unpcklps %xmm1, %xmm0
	xorps %xmm1, %xmm1
	movlhps %xmm1, %xmm0
	ret

The xorps and movlhps are unnecessary. This will require post legalizer optimization to handle.

llvm-svn: 27939

14215c36

Apr 21, 2006

fix thinko · 3e62d4b2
Chris Lattner authored Apr 21, 2006
```
llvm-svn: 27935
```
3e62d4b2
add some low-prio notes · e1f9ab7d
Chris Lattner authored Apr 21, 2006
```
llvm-svn: 27934
```
e1f9ab7d

Now generating perfect (I think) code for "vector set" with a single non-zero · e8b51800

Evan Cheng authored Apr 21, 2006

scalar value.

e.g.
        _mm_set_epi32(0, a, 0, 0);
==>
	movd 4(%esp), %xmm0
	pshufd $69, %xmm0, %xmm0

        _mm_set_epi8(0, 0, 0, 0, 0, a, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
==>
	movzbw 4(%esp), %ax
	movzwl %ax, %eax
	pxor %xmm0, %xmm0
	pinsrw $5, %eax, %xmm0

llvm-svn: 27923

e8b51800

Apr 20, 2006
- - Added support to turn "vector clear elements", e.g. pand V, <-1, -1, 0, -1> · 60f0b899
  Evan Cheng authored Apr 20, 2006
```
to a vector shuffle.
- VECTOR_SHUFFLE lowering change in preparation for more efficient codegen
of vector shuffle with zero (or any splat) vector.

llvm-svn: 27875
```
  60f0b899
- Handle v2i64 BUILD_VECTOR custom lowering correctly. v2i64 is a legal type, · 15c264b7
  Evan Cheng authored Apr 20, 2006
```
but i64 is not. If possible, change a i64 op to a f64 (e.g. load, constant)
and then cast it back.

llvm-svn: 27849
```
  15c264b7
- isSplatMask() bug: first element can be an undef. · 4a1b0d32
  Evan Cheng authored Apr 19, 2006
```
llvm-svn: 27847
```
  4a1b0d32
- - Added support to do aribitrary 4 wide shuffle with no more than three · a3caaee5
  Evan Cheng authored Apr 19, 2006
```
  instructions.
- Fixed a commute vector_shuff bug.

llvm-svn: 27845
```
  a3caaee5
Apr 19, 2006
- Prefer {p}unpack* and mov*dup over {p}shuf* as well. · 6d5297da
  Evan Cheng authored Apr 19, 2006
```
llvm-svn: 27844
```
  6d5297da
- - Renamed AddedCost to AddedComplexity. · b416a251
  Evan Cheng authored Apr 19, 2006
```
- Added more movhlps and movlhps patterns.

llvm-svn: 27842
```
  b416a251
- Commute vector_shuffle to match more movlhps, movlp{s|d} cases. · 7855e4d0
  Evan Cheng authored Apr 19, 2006
```
llvm-svn: 27840
```
  7855e4d0
- More mov{h|l}p{d|s} patterns. · cc7abc6c
  Evan Cheng authored Apr 19, 2006
```
llvm-svn: 27836
```
  cc7abc6c
- - More mov{h|l}ps patterns. · aeb09ccd
  Evan Cheng authored Apr 19, 2006
```
- Increase cost (complexity) of patterns which match mov{h|l}ps ops. These
  are preferred over shufps in most cases.

llvm-svn: 27835
```
  aeb09ccd
- Add a note. · bfab8281
  Chris Lattner authored Apr 19, 2006
```
llvm-svn: 27827
```
  bfab8281
Apr 18, 2006
- - PEXTRW cannot take a memory location as its first source operand. · 3823aa1d
  Evan Cheng authored Apr 18, 2006
```
- PINSRWrmi encoding bug.

llvm-svn: 27818
```
  3823aa1d
- SHUFP{S|D}, PSHUF* encoding bugs. Left out the mask immediate operand. · 43f4ef4f
  Evan Cheng authored Apr 18, 2006
```
llvm-svn: 27817
```
  43f4ef4f
- Name change for clarity sake · a179ea63
  Evan Cheng authored Apr 18, 2006
```
llvm-svn: 27816
```
  a179ea63
- Encoding bug: CMPPSrmi, CMPPDrmi dropped operand 2 (condtion immediate). · 09e36ef7
  Evan Cheng authored Apr 18, 2006
```
llvm-svn: 27815
```
  09e36ef7
- Name change for clarity sake · d799d680
  Evan Cheng authored Apr 18, 2006
```
llvm-svn: 27814
```
  d799d680
- Left a pattern out · 0ee281f3
  Evan Cheng authored Apr 18, 2006
```
llvm-svn: 27813
```
  0ee281f3
- Fixed an encoding bug: movd from XMM to R32. · e2d25a1a
  Evan Cheng authored Apr 18, 2006
```
llvm-svn: 27807
```
  e2d25a1a
- Teach the codegen about instructions used for SSE spill code, allowing it · bfc2c683
  Chris Lattner authored Apr 18, 2006
```
to optimize cases where it has to spill a lot

llvm-svn: 27801
```
  bfc2c683
- Correct comments · 4d36a369
  Evan Cheng authored Apr 18, 2006
```
llvm-svn: 27790
```
  4d36a369
- Another entry · 0ef23350
  Evan Cheng authored Apr 18, 2006
```
llvm-svn: 27786
```
  0ef23350
- Another entry. · e008bd3d
  Evan Cheng authored Apr 18, 2006
```
llvm-svn: 27784
```
  e008bd3d
- Use movss to insert_vector_elt(v, s, 0). · 5421206c
  Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27782
```
  5421206c
- Use two pinsrw to insert an element into v4i32 / v4f32 vector. · 6e5e2058
  Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27779
```
  6e5e2058
Apr 17, 2006
- Encoding bug · 22c06f05
  Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27773
```
  22c06f05
- Implement v8i16, v16i8 splat using unpckl + pshufd. · 5022b342
  Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27768
```
  5022b342
- implement returns of a vector, testcase here: CodeGen/X86/vec_return.ll · c070c621
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27767
```
  c070c621
- Incorrect foldMemoryOperand entries · bf0d13c5
  Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27763
```
  bf0d13c5
- Errors in patterns preventing load folding · 5112b5c5
  Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27762
```
  5112b5c5
- FP SETOLT, SETOLT, SETUGE, SETUGT conditions were implemented incorrectly · b3b41c4f
  Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27755
```
  b3b41c4f
Apr 16, 2006
- movduprm, movshduprm bugs · 20712dee
  Evan Cheng authored Apr 16, 2006
```
llvm-svn: 27734
```
  20712dee
- Encoding bugs · 3064f9aa
  Evan Cheng authored Apr 16, 2006
```
llvm-svn: 27733
```
  3064f9aa
- Can't fold loads into alias vector SSE ops used for scalar operation. The load · 685ddd81
  Evan Cheng authored Apr 16, 2006
```
address has to be 16-byte aligned but the values aren't spilled to 128-bit
locations.

llvm-svn: 27732
```
  685ddd81
Apr 15, 2006
- More encoding bugs · 8f1d8013
  Evan Cheng authored Apr 15, 2006
```
llvm-svn: 27722
```
  8f1d8013
- pslldrm, psrawrm, etc. encoding bug · 91944e86
  Evan Cheng authored Apr 15, 2006
```
llvm-svn: 27721
```
  91944e86