Commits · d426c8eae30a1a8ac9123d5def16819edda9540c · Roger Ferrer / llvm-epi-0.8

Sep 01, 2010
- Use x86 specific MOVSLDUP node, add more patterns to match it and remove useless load nodes · 4b56d872
  Bruno Cardoso Lopes authored Aug 31, 2010
```
llvm-svn: 112661
```
  4b56d872
- Use x86 specific MOVSHDUP node and add more patterns to match it · 61996ef8
  Bruno Cardoso Lopes authored Aug 31, 2010
```
llvm-svn: 112657
```
  61996ef8
Aug 31, 2010
- Use MOVHLPS node instead of matching using movhlps and movhlps_undef pattern fragments · 5de15ce4
  Bruno Cardoso Lopes authored Aug 31, 2010
```
llvm-svn: 112644
```
  5de15ce4
- Use MOVLHPS and MOVHLPS x86 nodes whenever possible. Also remove some useless nodes · 03e4c353
  Bruno Cardoso Lopes authored Aug 31, 2010
```
llvm-svn: 112642
```
  03e4c353
- Use X86ISD::MOVSS and MOVSD to represent the movl mask pattern, also fix the... · dfd9dd5d
  Bruno Cardoso Lopes authored Aug 31, 2010
```
Use X86ISD::MOVSS and MOVSD to represent the movl mask pattern, also fix the handling of those nodes when seeking for scalars inside vector shuffles

llvm-svn: 112570
```
  dfd9dd5d
Aug 28, 2010

fix the buildvector->insertp[sd] logic to not always create a redundant · 94656b1c

Chris Lattner authored Aug 28, 2010

insertp[sd] $0, which is a noop.  Before:

_f32:                                   ## @f32
	pshufd	$1, %xmm1, %xmm2
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm2, %xmm3
	addss	%xmm1, %xmm0
                                        ## kill: XMM0<def> XMM0<kill> XMM0<def>
	insertps	$0, %xmm0, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

after:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movdqa	%xmm2, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

The extra movs are due to a random (poor) scheduling decision.

llvm-svn: 112379

94656b1c

fix the BuildVector -> unpcklps logic to not do pointless shuffles · bcb6090a

Chris Lattner authored Aug 28, 2010

when the top elements of a vector are undefined.  This happens all
the time for X86-64 ABI stuff because only the low 2 elements of
a 4 element vector are defined.  For example, on:

_Complex float f32(_Complex float A, _Complex float B) {
  return A+B;
}

We used to produce (with SSE2, SSE4.1+ uses insertps):

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$16, %xmm2, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm0
	addss	%xmm1, %xmm0
	pshufd	$16, %xmm0, %xmm1
	movdqa	%xmm2, %xmm0
	unpcklps	%xmm1, %xmm0
	ret

We now produce:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movaps	%xmm2, %xmm0
	unpcklps	%xmm3, %xmm0
	ret

This implements rdar://8368414

llvm-svn: 112378

bcb6090a

improve comments in the unpcklps generating logic, introduce · 96db6e66

Chris Lattner authored Aug 28, 2010

a new EltStride variable instead of reusing NumElems variable
for a non-obvious purpose.  No functionality change.

llvm-svn: 112377

96db6e66

Clean up the logic of vector shuffles -> vector shifts. · a982aa24

Bruno Cardoso Lopes authored Aug 28, 2010

Also teach this logic how to handle target specific shuffles if
needed, this is necessary while searching recursively for zeroed
scalar elements in vector shuffle operands.

llvm-svn: 112348

a982aa24

Aug 27, 2010
- Properly handle passing of FP stuff to varargs function on Win64: · c0b36921
  Anton Korobeynikov authored Aug 27, 2010
```
value should be copied to the corresponding shadow reg as well.
Patch by Cameron Esfahani!

llvm-svn: 112262
```
  c0b36921
Aug 26, 2010
- zap the now unused MVT::getIntVectorWithNumElements · e25ba0c7
  Bruno Cardoso Lopes authored Aug 26, 2010
```
llvm-svn: 112218
```
  e25ba0c7
- implement SplitVecOp_CONCAT_VECTORS, fixing the included testcase with SSE1. · eb2cc0ce
  Chris Lattner authored Aug 26, 2010
```
llvm-svn: 112171
```
  eb2cc0ce
- fix sse1 only codegen in x86-64 mode, which is something we · cc60609c
  Chris Lattner authored Aug 26, 2010
```
apparently try to support.

llvm-svn: 112168
```
  cc60609c
Aug 25, 2010
- Revert this for now, PUNPCKLDQ dont operate on v4f32 · d4085f6e
  Bruno Cardoso Lopes authored Aug 25, 2010
```
llvm-svn: 112090
```
  d4085f6e
- Fix nasty mingw32 bug, which e.g. prevented llvm-gcc bootstrap there. · b3b53eca
  Anton Korobeynikov authored Aug 25, 2010
```
Mark _alloca call as clobberring EFLAGS, otherwise some DCE might remove
other flags-clobberring stuff (e.g. cmp instructions) occuring after
_alloca call.

llvm-svn: 112034
```
  b3b53eca
- PUNPCKLDQ should also be used for v4f32 · 0770d257
  Bruno Cardoso Lopes authored Aug 25, 2010
```
llvm-svn: 112020
```
  0770d257
- teach lowering to get target specific nodes for pshufd, emulating the same... · 2e45d522
  Bruno Cardoso Lopes authored Aug 25, 2010
```
teach lowering to get target specific nodes for pshufd, emulating the same isel behavior for now, so we can pass all vector shuffle tests

llvm-svn: 112017
```
  2e45d522
Aug 24, 2010
- Fix X86's isLegalAddressingMode to recognize that static addresses · c88fda47
  Dan Gohman authored Aug 24, 2010
```
need not be RIP-relative in small mode.

llvm-svn: 111917
```
  c88fda47
- Use pshufhw and pshuflw in more cases and fix getTargetShuffleNode number of arguments · 758d7b1f
  Bruno Cardoso Lopes authored Aug 24, 2010
```
llvm-svn: 111890
```
  758d7b1f
Aug 23, 2010
- Start using target speficic nodes for shuffles: pshufhw and pshuflw · 264d90ff
  Bruno Cardoso Lopes authored Aug 23, 2010
```
llvm-svn: 111837
```
  264d90ff
- Revert invalid r111792. Jump tables are not broken on x86-64 / coff, · cbbe4501
  Anton Korobeynikov authored Aug 23, 2010
```
it's COFF emitter which does not support differences of two symbols
(and needs to be fixed). GAS is pretty fine with code produced.

llvm-svn: 111801
```
  cbbe4501
- Workaround broken jump tables on x86-64 COFF. · e8723123
  Michael J. Spencer authored Aug 23, 2010
```
llvm-svn: 111792
```
  e8723123
Aug 21, 2010

Prepare LowerVECTOR_SHUFFLEv8i16 to use x86 target specific nodes directly · 9f20e7a1
Bruno Cardoso Lopes authored Aug 21, 2010
```
llvm-svn: 111704
```
9f20e7a1

This is the first step towards refactoring the x86 vector shuffle code. The · 6f3b38a8

Bruno Cardoso Lopes authored Aug 20, 2010

general idea here is to have a group of x86 target specific nodes which are
going to be selected during lowering and then directly matched in isel.

The commit includes the addition of those specific nodes and a *bunch* of
patterns, and incrementally we're going to switch between them and what we
have right now. Both the patterns and target specific nodes can change as
we move forward with this work.

llvm-svn: 111691

6f3b38a8

Aug 17, 2010

More fixes for win64: · 231ab847

Anton Korobeynikov authored Aug 17, 2010

  - Do not clobber al during variadic calls, this is AMD64 ABI-only feature
  - Emit wincall64, where necessary
Patch by Cameron Esfahani!

llvm-svn: 111289

231ab847

Aug 14, 2010
- Rework how the non-sse2 memory barrier is lowered so that the · 54194bd1
  Eric Christopher authored Aug 14, 2010
```
encoding is correct for the built-in assembler.

Based on a patch from Chris.

llvm-svn: 111083
```
  54194bd1
- improve indentation · 2f6c3434
  Chris Lattner authored Aug 14, 2010
```
llvm-svn: 111073
```
  2f6c3434
Aug 13, 2010
- Fix comment to reflect code, and remove an unused argument · 081861b6
  Bruno Cardoso Lopes authored Aug 13, 2010
```
llvm-svn: 111022
```
  081861b6
Aug 12, 2010

Begin to support some vector operations for AVX 256-bit intructions. The long · 7306c868

Bruno Cardoso Lopes authored Aug 12, 2010

term goal here is to be able to match enough of vector_shuffle and build_vector
so all avx intrinsics which aren't mapped to their own built-ins but to
shufflevector calls can be codegen'd. This is the first (baby) step, support
building zeroed vectors.

llvm-svn: 110897

7306c868

Aug 11, 2010

Use ISD::ADD instead of ISD::SUB with a negated constant. This · 5531aa4d

Dan Gohman authored Aug 11, 2010

avoids trouble if the return type of TD->getPointerSize() is
changed to something which doesn't promote to a signed type,
and is simpler anyway.

Also, use getCopyFromReg instead of getRegister to read a
physical register's value.

llvm-svn: 110835

5531aa4d

Add AVX matching patterns to Packed Bit Test intrinsics. · 91d61df3

Bruno Cardoso Lopes authored Aug 10, 2010

Apply the same approach of SSE4.1 ptest intrinsics but
create a new x86 node "testp" since AVX introduces
vtest{ps}{pd} instructions which set ZF and CF depending
on sign bit AND and ANDN of packed floating-point sources.

This is slightly different from what the "ptest" does.
Tests comming with the other 256 intrinsics tests.

llvm-svn: 110744

91d61df3

Aug 10, 2010
- Support AVX 256-bit load and store intrinsics · 85da72a8
  Bruno Cardoso Lopes authored Aug 10, 2010
```
llvm-svn: 110645
```
  85da72a8
Aug 06, 2010

Support very basic (doesn't include ABI support in the front-end, varags, ...)... · 77954bdf

Bruno Cardoso Lopes authored Aug 05, 2010

Support very basic (doesn't include ABI support in the front-end, varags, ...) 256-bit argument passing and return for AVX

llvm-svn: 110394

77954bdf

Aug 05, 2010
- Make x86-64 membarriers work without sse and clean up some of the · 2db84642
  Eric Christopher authored Aug 04, 2010
```
uses.

llvm-svn: 110274
```
  2db84642
Jul 30, 2010

Support all 128-bit AVX vector intrinsics. Most part of them I already · 349165b4

Bruno Cardoso Lopes authored Jul 30, 2010

declared during the addition of the assembler support, the additional
changes are:
- Add missing intrinsics
- Move all SSE conversion instructions in X86InstInfo64.td to the SSE.td file.
- Duplicate some patterns to AVX mode.
- Step into PCMPEST/PCMPIST custom inserter and add AVX versions.

llvm-svn: 109878

349165b4

Jul 29, 2010

Revert r109652, and remove the offending assert in loadRegFromStackSlot instead. · ba0e124a

Jakob Stoklund Olesen authored Jul 29, 2010

We do sometimes load from a too small stack slot when dealing with x86 arguments
(varargs and smaller-than-32-bit args). It looks like we know what we are doing
in those cases, so I am going to remove the assert instead of artifically
enlarging stack slot sizes.

The assert in storeRegToStackSlot stays in. We don't want to write beyond the
bounds of a stack slot.

llvm-svn: 109764

ba0e124a

Jul 28, 2010

Create a fixed stack object for varargs that is as large as any register. · f2234fbe

Jakob Stoklund Olesen authored Jul 28, 2010

The size of this object isn't used for anything - technically it is of variable
size.

This avoids a false positive from the assert in
X86InstrInfo::loadRegFromStackSlot, and fixes PR7735.

llvm-svn: 109652

f2234fbe

Implement a vectorized algorithm for <16 x i8> << <16 x i8> · 53afc8f0
Nate Begeman authored Jul 28, 2010
```
This is about 4x faster and smaller than the existing scalarization.

llvm-svn: 109566
```
53afc8f0

~40% faster vector shl <4 x i32> on SSE 4.1 Larger improvements for smaller... · 269a6da0

Nate Begeman authored Jul 27, 2010

~40% faster vector shl <4 x i32> on SSE 4.1  Larger improvements for smaller types coming in future patches.

For:

define <2 x i64> @shl(<4 x i32> %r, <4 x i32> %a) nounwind readnone ssp {
entry:
  %shl = shl <4 x i32> %r, %a                     ; <<4 x i32>> [#uses=1]
  %tmp2 = bitcast <4 x i32> %shl to <2 x i64>     ; <<2 x i64>> [#uses=1]
  ret <2 x i64> %tmp2
}

We get:

_shl:                                   ## @shl
	pslld	$23, %xmm1
	paddd	LCPI0_0, %xmm1
	cvttps2dq	%xmm1, %xmm1
	pmulld	%xmm1, %xmm0
	ret

Instead of:

_shl:                                   ## @shl
	pshufd	$3, %xmm0, %xmm2
	movd	%xmm2, %eax
	pshufd	$3, %xmm1, %xmm2
	movd	%xmm2, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm2
	pshufd	$1, %xmm0, %xmm3
	movd	%xmm3, %eax
	pshufd	$1, %xmm1, %xmm3
	movd	%xmm3, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm3
	punpckldq	%xmm2, %xmm3
	movd	%xmm0, %eax
	movd	%xmm1, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm2
	movhlps	%xmm0, %xmm0
	movd	%xmm0, %eax
	movhlps	%xmm1, %xmm1
	movd	%xmm1, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm0
	punpckldq	%xmm0, %xmm2
	movdqa	%xmm2, %xmm0
	punpckldq	%xmm3, %xmm0
	ret

llvm-svn: 109549

269a6da0

Jul 26, 2010
- On x86, f32 / f64 nodes share the same registers as 128-bit vector values. · d4218b87
  Evan Cheng authored Jul 26, 2010
```
llvm-svn: 109450
```
  d4218b87