Commits · 2c64ba63a1de70e868be24c24076bcd8f85b58fb · Roger Ferrer / llvm-epi-0.8

Aug 23, 2010
- Start using target speficic nodes for shuffles: pshufhw and pshuflw · 264d90ff
  Bruno Cardoso Lopes authored Aug 23, 2010
```
llvm-svn: 111837
```
  264d90ff
- Revert invalid r111792. Jump tables are not broken on x86-64 / coff, · cbbe4501
  Anton Korobeynikov authored Aug 23, 2010
```
it's COFF emitter which does not support differences of two symbols
(and needs to be fixed). GAS is pretty fine with code produced.

llvm-svn: 111801
```
  cbbe4501
- Workaround broken jump tables on x86-64 COFF. · e8723123
  Michael J. Spencer authored Aug 23, 2010
```
llvm-svn: 111792
```
  e8723123
Aug 21, 2010

Prepare LowerVECTOR_SHUFFLEv8i16 to use x86 target specific nodes directly · 9f20e7a1
Bruno Cardoso Lopes authored Aug 21, 2010
```
llvm-svn: 111704
```
9f20e7a1

This is the first step towards refactoring the x86 vector shuffle code. The · 6f3b38a8

Bruno Cardoso Lopes authored Aug 20, 2010

general idea here is to have a group of x86 target specific nodes which are
going to be selected during lowering and then directly matched in isel.

The commit includes the addition of those specific nodes and a *bunch* of
patterns, and incrementally we're going to switch between them and what we
have right now. Both the patterns and target specific nodes can change as
we move forward with this work.

llvm-svn: 111691

6f3b38a8

Aug 17, 2010

More fixes for win64: · 231ab847

Anton Korobeynikov authored Aug 17, 2010

  - Do not clobber al during variadic calls, this is AMD64 ABI-only feature
  - Emit wincall64, where necessary
Patch by Cameron Esfahani!

llvm-svn: 111289

231ab847

Aug 14, 2010
- Rework how the non-sse2 memory barrier is lowered so that the · 54194bd1
  Eric Christopher authored Aug 14, 2010
```
encoding is correct for the built-in assembler.

Based on a patch from Chris.

llvm-svn: 111083
```
  54194bd1
- improve indentation · 2f6c3434
  Chris Lattner authored Aug 14, 2010
```
llvm-svn: 111073
```
  2f6c3434
Aug 13, 2010
- Fix comment to reflect code, and remove an unused argument · 081861b6
  Bruno Cardoso Lopes authored Aug 13, 2010
```
llvm-svn: 111022
```
  081861b6
Aug 12, 2010

Begin to support some vector operations for AVX 256-bit intructions. The long · 7306c868

Bruno Cardoso Lopes authored Aug 12, 2010

term goal here is to be able to match enough of vector_shuffle and build_vector
so all avx intrinsics which aren't mapped to their own built-ins but to
shufflevector calls can be codegen'd. This is the first (baby) step, support
building zeroed vectors.

llvm-svn: 110897

7306c868

Aug 11, 2010

Use ISD::ADD instead of ISD::SUB with a negated constant. This · 5531aa4d

Dan Gohman authored Aug 11, 2010

avoids trouble if the return type of TD->getPointerSize() is
changed to something which doesn't promote to a signed type,
and is simpler anyway.

Also, use getCopyFromReg instead of getRegister to read a
physical register's value.

llvm-svn: 110835

5531aa4d

Add AVX matching patterns to Packed Bit Test intrinsics. · 91d61df3

Bruno Cardoso Lopes authored Aug 10, 2010

Apply the same approach of SSE4.1 ptest intrinsics but
create a new x86 node "testp" since AVX introduces
vtest{ps}{pd} instructions which set ZF and CF depending
on sign bit AND and ANDN of packed floating-point sources.

This is slightly different from what the "ptest" does.
Tests comming with the other 256 intrinsics tests.

llvm-svn: 110744

91d61df3

Aug 10, 2010
- Support AVX 256-bit load and store intrinsics · 85da72a8
  Bruno Cardoso Lopes authored Aug 10, 2010
```
llvm-svn: 110645
```
  85da72a8
Aug 06, 2010

Support very basic (doesn't include ABI support in the front-end, varags, ...)... · 77954bdf

Bruno Cardoso Lopes authored Aug 05, 2010

Support very basic (doesn't include ABI support in the front-end, varags, ...) 256-bit argument passing and return for AVX

llvm-svn: 110394

77954bdf

Aug 05, 2010
- Make x86-64 membarriers work without sse and clean up some of the · 2db84642
  Eric Christopher authored Aug 04, 2010
```
uses.

llvm-svn: 110274
```
  2db84642
Jul 30, 2010

Support all 128-bit AVX vector intrinsics. Most part of them I already · 349165b4

Bruno Cardoso Lopes authored Jul 30, 2010

declared during the addition of the assembler support, the additional
changes are:
- Add missing intrinsics
- Move all SSE conversion instructions in X86InstInfo64.td to the SSE.td file.
- Duplicate some patterns to AVX mode.
- Step into PCMPEST/PCMPIST custom inserter and add AVX versions.

llvm-svn: 109878

349165b4

Jul 29, 2010

Revert r109652, and remove the offending assert in loadRegFromStackSlot instead. · ba0e124a

Jakob Stoklund Olesen authored Jul 29, 2010

We do sometimes load from a too small stack slot when dealing with x86 arguments
(varargs and smaller-than-32-bit args). It looks like we know what we are doing
in those cases, so I am going to remove the assert instead of artifically
enlarging stack slot sizes.

The assert in storeRegToStackSlot stays in. We don't want to write beyond the
bounds of a stack slot.

llvm-svn: 109764

ba0e124a

Jul 28, 2010

Create a fixed stack object for varargs that is as large as any register. · f2234fbe

Jakob Stoklund Olesen authored Jul 28, 2010

The size of this object isn't used for anything - technically it is of variable
size.

This avoids a false positive from the assert in
X86InstrInfo::loadRegFromStackSlot, and fixes PR7735.

llvm-svn: 109652

f2234fbe

Implement a vectorized algorithm for <16 x i8> << <16 x i8> · 53afc8f0
Nate Begeman authored Jul 28, 2010
```
This is about 4x faster and smaller than the existing scalarization.

llvm-svn: 109566
```
53afc8f0

~40% faster vector shl <4 x i32> on SSE 4.1 Larger improvements for smaller... · 269a6da0

Nate Begeman authored Jul 27, 2010

~40% faster vector shl <4 x i32> on SSE 4.1  Larger improvements for smaller types coming in future patches.

For:

define <2 x i64> @shl(<4 x i32> %r, <4 x i32> %a) nounwind readnone ssp {
entry:
  %shl = shl <4 x i32> %r, %a                     ; <<4 x i32>> [#uses=1]
  %tmp2 = bitcast <4 x i32> %shl to <2 x i64>     ; <<2 x i64>> [#uses=1]
  ret <2 x i64> %tmp2
}

We get:

_shl:                                   ## @shl
	pslld	$23, %xmm1
	paddd	LCPI0_0, %xmm1
	cvttps2dq	%xmm1, %xmm1
	pmulld	%xmm1, %xmm0
	ret

Instead of:

_shl:                                   ## @shl
	pshufd	$3, %xmm0, %xmm2
	movd	%xmm2, %eax
	pshufd	$3, %xmm1, %xmm2
	movd	%xmm2, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm2
	pshufd	$1, %xmm0, %xmm3
	movd	%xmm3, %eax
	pshufd	$1, %xmm1, %xmm3
	movd	%xmm3, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm3
	punpckldq	%xmm2, %xmm3
	movd	%xmm0, %eax
	movd	%xmm1, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm2
	movhlps	%xmm0, %xmm0
	movd	%xmm0, %eax
	movhlps	%xmm1, %xmm1
	movd	%xmm1, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm0
	punpckldq	%xmm0, %xmm2
	movdqa	%xmm2, %xmm0
	punpckldq	%xmm3, %xmm0
	ret

llvm-svn: 109549

269a6da0

Jul 26, 2010
- On x86, f32 / f64 nodes share the same registers as 128-bit vector values. · d4218b87
  Evan Cheng authored Jul 26, 2010
```
llvm-svn: 109450
```
  d4218b87
Jul 24, 2010

Add an ILP scheduler. This is a register pressure aware scheduler that's · 37b740c4

Evan Cheng authored Jul 24, 2010

appropriate for targets without detailed instruction iterineries.
The scheduler schedules for increased instruction level parallelism in
low register pressure situation; it schedules to reduce register pressure
when the register pressure becomes high.

On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2
by 16%.

llvm-svn: 109300

37b740c4

Jul 23, 2010

The only supported calling convention for X86-64 uses · f2d75670

Dale Johannesen authored Jul 23, 2010

SSE, so we can't return floating point values if this
is disabled.  Detect this error for clang.

With SSE1 only, f64 is a problem; it can be done, but
neither llvm-gcc nor clang has ever generated correct
code for it.  Since nobody noticed this I think it's
OK to treat it as an error for now.

This also handles SSE-sized vectors of floating point.
8207686, 8204109.

llvm-svn: 109201

f2d75670

Jul 22, 2010
- Custom lower the memory barrier instructions and add support · 9a773826
  Eric Christopher authored Jul 22, 2010
```
for lowering without sse2.  Add a couple of new testcases.

Fixes a few libgomp tests and latent bugs.  Remove a few todos.

llvm-svn: 109078
```
  9a773826
- 80-columns. · a4c435f1
  Eric Christopher authored Jul 22, 2010
```
llvm-svn: 109070
```
  a4c435f1
Jul 21, 2010

Fix a couple issues with Win64 ABI · 784e062b

Nate Begeman authored Jul 21, 2010

1) all registers were spilled as xmm, regardless of actual size
2) win64 abi doesn't do the varargs-size-in-%al thing

Still to look into:

xmm6-15 are marked as clobbered by call instructions on win64 even though they aren't.

llvm-svn: 109035

784e062b

Pulling out previous patch, must've run the tests in · d27913e5
Eric Christopher authored Jul 21, 2010
```
the wrong directory.

llvm-svn: 109005
```
d27913e5
Lower MEMBARRIER on x86 and support processors without SSE2. · b2d10670
Eric Christopher authored Jul 21, 2010
```
Fixes a pile of libgomp failures in the llvm-gcc testsuite due
to the libcall not existing.

llvm-svn: 109004
```
b2d10670

Jul 16, 2010

Split -enable-finite-only-fp-math to two options: · 55f0c6b9

Evan Cheng authored Jul 15, 2010

-enable-no-nans-fp-math and -enable-no-infs-fp-math. All of the current codegen fp math optimizations only care whether the fp arithmetics arguments and results can never be NaN.

llvm-svn: 108465

55f0c6b9

Jul 15, 2010
- Use TargetOpcode::COPY instead of X86-native register copy instructions when · 9b449d5a
  Jakob Stoklund Olesen authored Jul 14, 2010
```
lowering atomics. This will allow those copies to still be coalesced after
TII::isMoveInstr is removed.

llvm-svn: 108385
```
  9b449d5a
Jul 14, 2010

Fix for PR7193 was overly conservative. The only case where sibcall callee · a8e88745

Evan Cheng authored Jul 14, 2010

address cannot be allocated a register is in 32-bit mode where the first
three arguments are marked inreg. In that case EAX, EDX, and ECX will be
used for argument passing.

This fixes PR7610.

llvm-svn: 108327

a8e88745

Jul 10, 2010

Reapply bottom-up fast-isel, with several fixes for x86-32: · d7b5ce33

Dan Gohman authored Jul 10, 2010

 - Check getBytesToPopOnReturn().
 - Eschew ST0 and ST1 for return values.
 - Fix the PIC base register initialization so that it doesn't ever
   fail to end up the top of the entry block.

llvm-svn: 108039

d7b5ce33

An x86 function returns a floating point value in st(0), and we must make sure · be8d9b0b

Jakob Stoklund Olesen authored Jul 10, 2010

it is popped, even if it is ununsed. A CopyFromReg node is too weak to represent
the required sideeffect, so insert an FpGET_ST0 instruction directly instead.

This will matter when CopyFromReg gets lowered to a generic COPY instruction.

llvm-svn: 108037

be8d9b0b

Jul 09, 2010

--- Reverse-merging r107947 into '.': · 6586e9b2

Bob Wilson authored Jul 09, 2010

U    utils/TableGen/FastISelEmitter.cpp
--- Reverse-merging r107943 into '.':
U    test/CodeGen/X86/fast-isel.ll
U    test/CodeGen/X86/fast-isel-loads.ll
U    include/llvm/Target/TargetLowering.h
U    include/llvm/Support/PassNameParser.h
U    include/llvm/CodeGen/FunctionLoweringInfo.h
U    include/llvm/CodeGen/CallingConvLower.h
U    include/llvm/CodeGen/FastISel.h
U    include/llvm/CodeGen/SelectionDAGISel.h
U    lib/CodeGen/LLVMTargetMachine.cpp
U    lib/CodeGen/CallingConvLower.cpp
U    lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
U    lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
U    lib/CodeGen/SelectionDAG/FastISel.cpp
U    lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
U    lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
U    lib/CodeGen/SelectionDAG/InstrEmitter.cpp
U    lib/CodeGen/SelectionDAG/TargetLowering.cpp
U    lib/Target/XCore/XCoreISelLowering.cpp
U    lib/Target/XCore/XCoreISelLowering.h
U    lib/Target/X86/X86ISelLowering.cpp
U    lib/Target/X86/X86FastISel.cpp
U    lib/Target/X86/X86ISelLowering.h

llvm-svn: 107987

6586e9b2

Fix the memoperand offsets in code generated for va_start. · 0a7d155d
Dan Gohman authored Jul 09, 2010
```
llvm-svn: 107948
```
0a7d155d
Re-apply bottom-up fast-isel, with fixes. Be very careful to avoid emitting · 0b5aa1cd
Dan Gohman authored Jul 09, 2010
```
a DBG_VALUE after a terminator, or emitting any instructions before an EH_LABEL.

llvm-svn: 107943
```
0b5aa1cd

Change LEA to have 5 operands for its memory operand, just · f469307c

Chris Lattner authored Jul 08, 2010

like all other instructions, even though a segment is not
allowed.  This resolves a bunch of gross hacks in the 
encoder and makes LEA more consistent with the rest of the
instruction set.

No functionality change.

llvm-svn: 107934

f469307c

add some long-overdue enums to refer to the parts of the 5-operand · ec536276
Chris Lattner authored Jul 08, 2010
```
X86 memory operand.

llvm-svn: 107925
```
ec536276

Jul 08, 2010
- Revert 107840 107839 107813 107804 107800 107797 107791. · e7570436
  Dan Gohman authored Jul 08, 2010
```
Debug info intrinsics win for now.

llvm-svn: 107850
```
  e7570436
- Move getExtLoad() and (some) getLoad() DebugLoc argument after EVT argument for consistency sake. · 1c349f18
  Evan Cheng authored Jul 07, 2010
```
llvm-svn: 107820
```
  1c349f18