Commits · 96db6e66f43fd999a58dd288c7e0bd77c9868359 · Roger Ferrer / llvm-epi-0.8

Aug 28, 2010

improve comments in the unpcklps generating logic, introduce · 96db6e66

Chris Lattner authored Aug 28, 2010

a new EltStride variable instead of reusing NumElems variable
for a non-obvious purpose.  No functionality change.

llvm-svn: 112377

96db6e66

Clean up the logic of vector shuffles -> vector shifts. · a982aa24

Bruno Cardoso Lopes authored Aug 28, 2010

Also teach this logic how to handle target specific shuffles if
needed, this is necessary while searching recursively for zeroed
scalar elements in vector shuffle operands.

llvm-svn: 112348

a982aa24

Aug 27, 2010
- Properly handle passing of FP stuff to varargs function on Win64: · c0b36921
  Anton Korobeynikov authored Aug 27, 2010
```
value should be copied to the corresponding shadow reg as well.
Patch by Cameron Esfahani!

llvm-svn: 112262
```
  c0b36921
Aug 26, 2010
- zap the now unused MVT::getIntVectorWithNumElements · e25ba0c7
  Bruno Cardoso Lopes authored Aug 26, 2010
```
llvm-svn: 112218
```
  e25ba0c7
- implement SplitVecOp_CONCAT_VECTORS, fixing the included testcase with SSE1. · eb2cc0ce
  Chris Lattner authored Aug 26, 2010
```
llvm-svn: 112171
```
  eb2cc0ce
- fix sse1 only codegen in x86-64 mode, which is something we · cc60609c
  Chris Lattner authored Aug 26, 2010
```
apparently try to support.

llvm-svn: 112168
```
  cc60609c
Aug 25, 2010
- Revert this for now, PUNPCKLDQ dont operate on v4f32 · d4085f6e
  Bruno Cardoso Lopes authored Aug 25, 2010
```
llvm-svn: 112090
```
  d4085f6e
- Fix nasty mingw32 bug, which e.g. prevented llvm-gcc bootstrap there. · b3b53eca
  Anton Korobeynikov authored Aug 25, 2010
```
Mark _alloca call as clobberring EFLAGS, otherwise some DCE might remove
other flags-clobberring stuff (e.g. cmp instructions) occuring after
_alloca call.

llvm-svn: 112034
```
  b3b53eca
- PUNPCKLDQ should also be used for v4f32 · 0770d257
  Bruno Cardoso Lopes authored Aug 25, 2010
```
llvm-svn: 112020
```
  0770d257
- teach lowering to get target specific nodes for pshufd, emulating the same... · 2e45d522
  Bruno Cardoso Lopes authored Aug 25, 2010
```
teach lowering to get target specific nodes for pshufd, emulating the same isel behavior for now, so we can pass all vector shuffle tests

llvm-svn: 112017
```
  2e45d522
Aug 24, 2010
- Fix X86's isLegalAddressingMode to recognize that static addresses · c88fda47
  Dan Gohman authored Aug 24, 2010
```
need not be RIP-relative in small mode.

llvm-svn: 111917
```
  c88fda47
- Use pshufhw and pshuflw in more cases and fix getTargetShuffleNode number of arguments · 758d7b1f
  Bruno Cardoso Lopes authored Aug 24, 2010
```
llvm-svn: 111890
```
  758d7b1f
Aug 23, 2010
- Start using target speficic nodes for shuffles: pshufhw and pshuflw · 264d90ff
  Bruno Cardoso Lopes authored Aug 23, 2010
```
llvm-svn: 111837
```
  264d90ff
- Revert invalid r111792. Jump tables are not broken on x86-64 / coff, · cbbe4501
  Anton Korobeynikov authored Aug 23, 2010
```
it's COFF emitter which does not support differences of two symbols
(and needs to be fixed). GAS is pretty fine with code produced.

llvm-svn: 111801
```
  cbbe4501
- Workaround broken jump tables on x86-64 COFF. · e8723123
  Michael J. Spencer authored Aug 23, 2010
```
llvm-svn: 111792
```
  e8723123
Aug 21, 2010

Prepare LowerVECTOR_SHUFFLEv8i16 to use x86 target specific nodes directly · 9f20e7a1
Bruno Cardoso Lopes authored Aug 21, 2010
```
llvm-svn: 111704
```
9f20e7a1

This is the first step towards refactoring the x86 vector shuffle code. The · 6f3b38a8

Bruno Cardoso Lopes authored Aug 20, 2010

general idea here is to have a group of x86 target specific nodes which are
going to be selected during lowering and then directly matched in isel.

The commit includes the addition of those specific nodes and a *bunch* of
patterns, and incrementally we're going to switch between them and what we
have right now. Both the patterns and target specific nodes can change as
we move forward with this work.

llvm-svn: 111691

6f3b38a8

Aug 17, 2010

More fixes for win64: · 231ab847

Anton Korobeynikov authored Aug 17, 2010

  - Do not clobber al during variadic calls, this is AMD64 ABI-only feature
  - Emit wincall64, where necessary
Patch by Cameron Esfahani!

llvm-svn: 111289

231ab847

Aug 14, 2010
- Rework how the non-sse2 memory barrier is lowered so that the · 54194bd1
  Eric Christopher authored Aug 14, 2010
```
encoding is correct for the built-in assembler.

Based on a patch from Chris.

llvm-svn: 111083
```
  54194bd1
- improve indentation · 2f6c3434
  Chris Lattner authored Aug 14, 2010
```
llvm-svn: 111073
```
  2f6c3434
Aug 13, 2010
- Fix comment to reflect code, and remove an unused argument · 081861b6
  Bruno Cardoso Lopes authored Aug 13, 2010
```
llvm-svn: 111022
```
  081861b6
Aug 12, 2010

Begin to support some vector operations for AVX 256-bit intructions. The long · 7306c868

Bruno Cardoso Lopes authored Aug 12, 2010

term goal here is to be able to match enough of vector_shuffle and build_vector
so all avx intrinsics which aren't mapped to their own built-ins but to
shufflevector calls can be codegen'd. This is the first (baby) step, support
building zeroed vectors.

llvm-svn: 110897

7306c868

Aug 11, 2010

Use ISD::ADD instead of ISD::SUB with a negated constant. This · 5531aa4d

Dan Gohman authored Aug 11, 2010

avoids trouble if the return type of TD->getPointerSize() is
changed to something which doesn't promote to a signed type,
and is simpler anyway.

Also, use getCopyFromReg instead of getRegister to read a
physical register's value.

llvm-svn: 110835

5531aa4d

Add AVX matching patterns to Packed Bit Test intrinsics. · 91d61df3

Bruno Cardoso Lopes authored Aug 10, 2010

Apply the same approach of SSE4.1 ptest intrinsics but
create a new x86 node "testp" since AVX introduces
vtest{ps}{pd} instructions which set ZF and CF depending
on sign bit AND and ANDN of packed floating-point sources.

This is slightly different from what the "ptest" does.
Tests comming with the other 256 intrinsics tests.

llvm-svn: 110744

91d61df3

Aug 10, 2010
- Support AVX 256-bit load and store intrinsics · 85da72a8
  Bruno Cardoso Lopes authored Aug 10, 2010
```
llvm-svn: 110645
```
  85da72a8
Aug 06, 2010

Support very basic (doesn't include ABI support in the front-end, varags, ...)... · 77954bdf

Bruno Cardoso Lopes authored Aug 05, 2010

Support very basic (doesn't include ABI support in the front-end, varags, ...) 256-bit argument passing and return for AVX

llvm-svn: 110394

77954bdf

Aug 05, 2010
- Make x86-64 membarriers work without sse and clean up some of the · 2db84642
  Eric Christopher authored Aug 04, 2010
```
uses.

llvm-svn: 110274
```
  2db84642
Jul 30, 2010

Support all 128-bit AVX vector intrinsics. Most part of them I already · 349165b4

Bruno Cardoso Lopes authored Jul 30, 2010

declared during the addition of the assembler support, the additional
changes are:
- Add missing intrinsics
- Move all SSE conversion instructions in X86InstInfo64.td to the SSE.td file.
- Duplicate some patterns to AVX mode.
- Step into PCMPEST/PCMPIST custom inserter and add AVX versions.

llvm-svn: 109878

349165b4

Jul 29, 2010

Revert r109652, and remove the offending assert in loadRegFromStackSlot instead. · ba0e124a

Jakob Stoklund Olesen authored Jul 29, 2010

We do sometimes load from a too small stack slot when dealing with x86 arguments
(varargs and smaller-than-32-bit args). It looks like we know what we are doing
in those cases, so I am going to remove the assert instead of artifically
enlarging stack slot sizes.

The assert in storeRegToStackSlot stays in. We don't want to write beyond the
bounds of a stack slot.

llvm-svn: 109764

ba0e124a

Jul 28, 2010

Create a fixed stack object for varargs that is as large as any register. · f2234fbe

Jakob Stoklund Olesen authored Jul 28, 2010

The size of this object isn't used for anything - technically it is of variable
size.

This avoids a false positive from the assert in
X86InstrInfo::loadRegFromStackSlot, and fixes PR7735.

llvm-svn: 109652

f2234fbe

Implement a vectorized algorithm for <16 x i8> << <16 x i8> · 53afc8f0
Nate Begeman authored Jul 28, 2010
```
This is about 4x faster and smaller than the existing scalarization.

llvm-svn: 109566
```
53afc8f0

~40% faster vector shl <4 x i32> on SSE 4.1 Larger improvements for smaller... · 269a6da0

Nate Begeman authored Jul 27, 2010

~40% faster vector shl <4 x i32> on SSE 4.1  Larger improvements for smaller types coming in future patches.

For:

define <2 x i64> @shl(<4 x i32> %r, <4 x i32> %a) nounwind readnone ssp {
entry:
  %shl = shl <4 x i32> %r, %a                     ; <<4 x i32>> [#uses=1]
  %tmp2 = bitcast <4 x i32> %shl to <2 x i64>     ; <<2 x i64>> [#uses=1]
  ret <2 x i64> %tmp2
}

We get:

_shl:                                   ## @shl
	pslld	$23, %xmm1
	paddd	LCPI0_0, %xmm1
	cvttps2dq	%xmm1, %xmm1
	pmulld	%xmm1, %xmm0
	ret

Instead of:

_shl:                                   ## @shl
	pshufd	$3, %xmm0, %xmm2
	movd	%xmm2, %eax
	pshufd	$3, %xmm1, %xmm2
	movd	%xmm2, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm2
	pshufd	$1, %xmm0, %xmm3
	movd	%xmm3, %eax
	pshufd	$1, %xmm1, %xmm3
	movd	%xmm3, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm3
	punpckldq	%xmm2, %xmm3
	movd	%xmm0, %eax
	movd	%xmm1, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm2
	movhlps	%xmm0, %xmm0
	movd	%xmm0, %eax
	movhlps	%xmm1, %xmm1
	movd	%xmm1, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm0
	punpckldq	%xmm0, %xmm2
	movdqa	%xmm2, %xmm0
	punpckldq	%xmm3, %xmm0
	ret

llvm-svn: 109549

269a6da0

Jul 26, 2010
- On x86, f32 / f64 nodes share the same registers as 128-bit vector values. · d4218b87
  Evan Cheng authored Jul 26, 2010
```
llvm-svn: 109450
```
  d4218b87
Jul 24, 2010

Add an ILP scheduler. This is a register pressure aware scheduler that's · 37b740c4

Evan Cheng authored Jul 24, 2010

appropriate for targets without detailed instruction iterineries.
The scheduler schedules for increased instruction level parallelism in
low register pressure situation; it schedules to reduce register pressure
when the register pressure becomes high.

On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2
by 16%.

llvm-svn: 109300

37b740c4

Jul 23, 2010

The only supported calling convention for X86-64 uses · f2d75670

Dale Johannesen authored Jul 23, 2010

SSE, so we can't return floating point values if this
is disabled.  Detect this error for clang.

With SSE1 only, f64 is a problem; it can be done, but
neither llvm-gcc nor clang has ever generated correct
code for it.  Since nobody noticed this I think it's
OK to treat it as an error for now.

This also handles SSE-sized vectors of floating point.
8207686, 8204109.

llvm-svn: 109201

f2d75670

Jul 22, 2010
- Custom lower the memory barrier instructions and add support · 9a773826
  Eric Christopher authored Jul 22, 2010
```
for lowering without sse2.  Add a couple of new testcases.

Fixes a few libgomp tests and latent bugs.  Remove a few todos.

llvm-svn: 109078
```
  9a773826
- 80-columns. · a4c435f1
  Eric Christopher authored Jul 22, 2010
```
llvm-svn: 109070
```
  a4c435f1
Jul 21, 2010

Fix a couple issues with Win64 ABI · 784e062b

Nate Begeman authored Jul 21, 2010

1) all registers were spilled as xmm, regardless of actual size
2) win64 abi doesn't do the varargs-size-in-%al thing

Still to look into:

xmm6-15 are marked as clobbered by call instructions on win64 even though they aren't.

llvm-svn: 109035

784e062b

Pulling out previous patch, must've run the tests in · d27913e5
Eric Christopher authored Jul 21, 2010
```
the wrong directory.

llvm-svn: 109005
```
d27913e5
Lower MEMBARRIER on x86 and support processors without SSE2. · b2d10670
Eric Christopher authored Jul 21, 2010
```
Fixes a pile of libgomp failures in the llvm-gcc testsuite due
to the libcall not existing.

llvm-svn: 109004
```
b2d10670