Commits · e728efdfce8bb978a576fa3ba9306d17d565b077 · Roger Ferrer / llvm-epi-0.8

Apr 22, 2006

Don't do all the lowering stuff for 2-wide build_vector's. Also, minor... · e728efdf

Evan Cheng authored Apr 22, 2006

Don't do all the lowering stuff for 2-wide build_vector's. Also, minor optimization for shuffle of undef.

llvm-svn: 27946

e728efdf

Fix a performance regression. Use {p}shuf* when there are only two distinct... · 16ef94f4
Evan Cheng authored Apr 22, 2006
```
Fix a performance regression. Use {p}shuf* when there are only two distinct elements in a build_vector.

llvm-svn: 27945
```
16ef94f4

Revamp build_vector lowering to take advantage of movss and movd instructions. · 14215c36

Evan Cheng authored Apr 21, 2006

movd always clear the top 96 bits and movss does so when it's loading the
value from memory.
The net result is codegen for 4-wide shuffles is much improved. It is near
optimal if one or more elements is a zero. e.g.

__m128i test(int a, int b) {
  return _mm_set_epi32(0, 0, b, a);
}

compiles to

_test:
	movd 8(%esp), %xmm1
	movd 4(%esp), %xmm0
	punpckldq %xmm1, %xmm0
	ret

compare to gcc:

_test:
	subl	$12, %esp
	movd	20(%esp), %xmm0
	movd	16(%esp), %xmm1
	punpckldq	%xmm0, %xmm1
	movq	%xmm1, %xmm0
	movhps	LC0, %xmm0
	addl	$12, %esp
	ret

or icc:

_test:
        movd      4(%esp), %xmm0                                #5.10
        movd      8(%esp), %xmm3                                #5.10
        xorl      %eax, %eax                                    #5.10
        movd      %eax, %xmm1                                   #5.10
        punpckldq %xmm1, %xmm0                                  #5.10
        movd      %eax, %xmm2                                   #5.10
        punpckldq %xmm2, %xmm3                                  #5.10
        punpckldq %xmm3, %xmm0                                  #5.10
        ret                                                     #5.10

There are still room for improvement, for example the FP variant of the above example:

__m128 test(float a, float b) {
  return _mm_set_ps(0.0, 0.0, b, a);
}

_test:
	movss 8(%esp), %xmm1
	movss 4(%esp), %xmm0
	unpcklps %xmm1, %xmm0
	xorps %xmm1, %xmm1
	movlhps %xmm1, %xmm0
	ret

The xorps and movlhps are unnecessary. This will require post legalizer optimization to handle.

llvm-svn: 27939

14215c36

Apr 21, 2006

Now generating perfect (I think) code for "vector set" with a single non-zero · e8b51800

Evan Cheng authored Apr 21, 2006

scalar value.

e.g.
        _mm_set_epi32(0, a, 0, 0);
==>
	movd 4(%esp), %xmm0
	pshufd $69, %xmm0, %xmm0

        _mm_set_epi8(0, 0, 0, 0, 0, a, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
==>
	movzbw 4(%esp), %ax
	movzwl %ax, %eax
	pxor %xmm0, %xmm0
	pinsrw $5, %eax, %xmm0

llvm-svn: 27923

e8b51800

Apr 20, 2006
- - Added support to turn "vector clear elements", e.g. pand V, <-1, -1, 0, -1> · 60f0b899
  Evan Cheng authored Apr 20, 2006
```
to a vector shuffle.
- VECTOR_SHUFFLE lowering change in preparation for more efficient codegen
of vector shuffle with zero (or any splat) vector.

llvm-svn: 27875
```
  60f0b899
- Handle v2i64 BUILD_VECTOR custom lowering correctly. v2i64 is a legal type, · 15c264b7
  Evan Cheng authored Apr 20, 2006
```
but i64 is not. If possible, change a i64 op to a f64 (e.g. load, constant)
and then cast it back.

llvm-svn: 27849
```
  15c264b7
- isSplatMask() bug: first element can be an undef. · 4a1b0d32
  Evan Cheng authored Apr 19, 2006
```
llvm-svn: 27847
```
  4a1b0d32
- - Added support to do aribitrary 4 wide shuffle with no more than three · a3caaee5
  Evan Cheng authored Apr 19, 2006
```
  instructions.
- Fixed a commute vector_shuff bug.

llvm-svn: 27845
```
  a3caaee5
Apr 19, 2006
- Commute vector_shuffle to match more movlhps, movlp{s|d} cases. · 7855e4d0
  Evan Cheng authored Apr 19, 2006
```
llvm-svn: 27840
```
  7855e4d0
Apr 18, 2006
- Use movss to insert_vector_elt(v, s, 0). · 5421206c
  Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27782
```
  5421206c
- Use two pinsrw to insert an element into v4i32 / v4f32 vector. · 6e5e2058
  Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27779
```
  6e5e2058
Apr 17, 2006
- Implement v8i16, v16i8 splat using unpckl + pshufd. · 5022b342
  Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27768
```
  5022b342
- implement returns of a vector, testcase here: CodeGen/X86/vec_return.ll · c070c621
  Chris Lattner authored Apr 17, 2006
```
llvm-svn: 27767
```
  c070c621
- FP SETOLT, SETOLT, SETUGE, SETUGT conditions were implemented incorrectly · b3b41c4f
  Evan Cheng authored Apr 17, 2006
```
llvm-svn: 27755
```
  b3b41c4f
Apr 15, 2006
- Silly bug · 6222cf2a
  Evan Cheng authored Apr 15, 2006
```
llvm-svn: 27719
```
  6222cf2a
- Do not use movs{h|l}dup for a shuffle with a single non-undef node. · 65bb720a
  Evan Cheng authored Apr 15, 2006
```
llvm-svn: 27718
```
  65bb720a
Apr 14, 2006
- Last few SSE3 intrinsics. · 5d247f81
  Evan Cheng authored Apr 14, 2006
```
llvm-svn: 27711
```
  5d247f81
Apr 13, 2006
- X86 SSE2 supports v8i16 multiplication · e4f97ccf
  Evan Cheng authored Apr 13, 2006
```
llvm-svn: 27644
```
  e4f97ccf
Apr 12, 2006
- All "integer" logical ops (pand, por, pxor) are now promoted to v2i64. · 92232307
  Evan Cheng authored Apr 12, 2006
```
Clean up and fix various logical ops issues.

llvm-svn: 27633
```
  92232307
- Promote v4i32, v8i16, v16i8 load to v2i64 load. · e2157c6e
  Evan Cheng authored Apr 12, 2006
```
llvm-svn: 27612
```
  e2157c6e
Apr 11, 2006
- Added support for _mm_move_ss and _mm_move_sd. · 12ba3e23
  Evan Cheng authored Apr 11, 2006
```
llvm-svn: 27575
```
  12ba3e23
Apr 10, 2006
- Conditional move of vector types. · 617a6a81
  Evan Cheng authored Apr 10, 2006
```
llvm-svn: 27556
```
  617a6a81
Apr 07, 2006

Code clean up. · ac847268
Evan Cheng authored Apr 07, 2006
```
llvm-svn: 27501
```
ac847268

- movlp{s|d} and movhp{s|d} support. · c995b45f

Evan Cheng authored Apr 06, 2006

- Normalize shuffle nodes so result vector lower half elements come from the
  first vector, the rest come from the second vector. (Except for the
  exceptions :-).
- Other minor fixes.

llvm-svn: 27474

c995b45f

Apr 06, 2006
- Support for comi / ucomi intrinsics. · 78038294
  Evan Cheng authored Apr 05, 2006
```
llvm-svn: 27444
```
  78038294
Apr 05, 2006
- Handle canonical form of e.g. · f3b52c84
  Evan Cheng authored Apr 05, 2006
```
vector_shuffle v1, v1, <0, 4, 1, 5, 2, 6, 3, 7>

This is turned into
vector_shuffle v1, <undef>, <0, 0, 1, 1, 2, 2, 3, 3>
by dag combiner.

It would match a {p}unpckl on x86.

llvm-svn: 27437
```
  f3b52c84
- Bogus assert · 6d196db4
  Evan Cheng authored Apr 05, 2006
```
llvm-svn: 27434
```
  6d196db4
- Fallthrough to expand if a VECTOR_SHUFFLE cannot be custom lowered. · 2cf4232c
  Evan Cheng authored Apr 05, 2006
```
llvm-svn: 27433
```
  2cf4232c
- Handle v8i16 shuffle that must be broken into a pair of pshufhw / pshuflw. · 59a6355e
  Evan Cheng authored Apr 05, 2006
```
llvm-svn: 27427
```
  59a6355e
Apr 04, 2006
- Use movlpd to: store lower f64 extracted from v2f64. · b64827e6
  Evan Cheng authored Apr 03, 2006
```
Use movhpd to: store upper f64 extracted from v2f64.

llvm-svn: 27382
```
  b64827e6
Apr 03, 2006
- - More efficient extract_vector_elt with shuffle and movss, movsd, movd, etc. · ebf1006d
  Evan Cheng authored Apr 03, 2006
```
- Some bug fixes and naming inconsistency fixes.

llvm-svn: 27377
```
  ebf1006d
Mar 31, 2006
- Use a X86 target specific node X86ISD::PINSRW instead of a mal-formed · 5fd7c694
  Evan Cheng authored Mar 31, 2006
```
INSERT_VECTOR_ELT to insert a 16-bit value in a 128-bit vector.

llvm-svn: 27314
```
  5fd7c694
- Add support to use pextrw and pinsrw to extract and insert a word element · cbffa465
  Evan Cheng authored Mar 31, 2006
```
from a 128-bit vector.

llvm-svn: 27304
```
  cbffa465
- Expand all INSERT_VECTOR_ELT (obviously bad) for now. · 1b0d294d
  Evan Cheng authored Mar 31, 2006
```
llvm-svn: 27275
```
  1b0d294d
- Typo · d9d0bbb5
  Evan Cheng authored Mar 31, 2006
```
llvm-svn: 27272
```
  d9d0bbb5
- Ok for vector_shuffle mask to contain undef elements. · 99d7205f
  Evan Cheng authored Mar 31, 2006
```
llvm-svn: 27271
```
  99d7205f
Mar 30, 2006

Make sure all possible shuffles are matched. · 7e2ff11a

Evan Cheng authored Mar 30, 2006

Use pshufd, pshuhw, and pshulw to shuffle v4f32 if shufps doesn't match.
Use shufps to shuffle v4f32 if pshufd, pshuhw, and pshulw don't match.

llvm-svn: 27259

7e2ff11a

- Added some SSE2 128-bit packed integer ops. · b7fedffc

Evan Cheng authored Mar 29, 2006

- Added SSE2 128-bit integer pack with signed saturation ops.
- Added pshufhw and pshuflw ops.

llvm-svn: 27252

b7fedffc

Mar 29, 2006
- Need to special case splat after all. Make the second operand of splat · acc33647
  Evan Cheng authored Mar 29, 2006
```
vector_shuffle undef.

llvm-svn: 27250
```
  acc33647
- - More shuffle related bug fixes. · 500ec165
  Evan Cheng authored Mar 29, 2006
```
- Whenever possible use ops of the right packed types for vector shuffles /
  splats.

llvm-svn: 27246
```
  500ec165