Commits · 0752bffe9aa0051518316d410e7373b7f896bc2d · Roger Ferrer / llvm-epi-0.8

May 22, 2008
- Add missing patterns. · 53963b77
  Evan Cheng authored May 22, 2008
```
llvm-svn: 51435
```
  53963b77
May 20, 2008
- movsd and movq do not require 16-byte alignment. This fixes vec_set-5.ll on Linux. · f945f943
  Evan Cheng authored May 20, 2008
```
llvm-svn: 51327
```
  f945f943
May 13, 2008
- Fix one more encoding bug. · 6645714f
  Nate Begeman authored May 13, 2008
```
llvm-svn: 51057
```
  6645714f
- Fix and encoding error in the psrad xmm, imm8 instruction. · 50f7ef30
  Nate Begeman authored May 13, 2008
```
llvm-svn: 51020
```
  50f7ef30
- Teach Legalize how to scalarize VSETCC · b87e63a7
  Nate Begeman authored May 12, 2008
```
Teach X86 a few more vsetcc patterns.  Custom lowering for unsupported ones is next.

llvm-svn: 51009
```
  b87e63a7
May 12, 2008
- Initial X86 codegen support for VSETCC. · d875c3e2
  Nate Begeman authored May 12, 2008
```
llvm-svn: 51000
```
  d875c3e2
May 10, 2008
- Some clean up. · da2587ce
  Evan Cheng authored May 10, 2008
```
llvm-svn: 50929
```
  da2587ce
- Add a pattern to do move the low element of a v4f32 and zero extend the rest. · 867af267
  Evan Cheng authored May 09, 2008
```
llvm-svn: 50922
```
  867af267
May 09, 2008
- Handle a few more cases of folding load i64 into xmm and zero top bits. · 961339bb
  Evan Cheng authored May 09, 2008
```
Note, some of the code will be moved into target independent part of DAG combiner in a subsequent patch.

llvm-svn: 50918
```
  961339bb
- Use movq to move low half of XMM register and zero-extend the rest. · 0360ecbe
  Evan Cheng authored May 08, 2008
```
llvm-svn: 50874
```
  0360ecbe
May 08, 2008

Handle vector move / load which zero the destination register top bits (i.e.... · 78af38c3

Evan Cheng authored May 08, 2008

Handle vector move / load which zero the destination register top bits (i.e. movd, movq, movss (addr), movsd (addr)) with X86 specific dag combine.

llvm-svn: 50838

78af38c3

May 03, 2008

Add separate intrinsics for MMX / SSE shifts with i32 integer operands. This... · cdf22f29

Evan Cheng authored May 03, 2008

Add separate intrinsics for MMX / SSE shifts with i32 integer operands. This allow us to simplify the horribly complicated matching code.

llvm-svn: 50601

cdf22f29

May 02, 2008
- 80 column violation. · 4f9cd918
  Evan Cheng authored May 02, 2008
```
llvm-svn: 50575
```
  4f9cd918
Apr 20, 2008
- A better fix for my previous patch, MOVZQI2PQIrr just requires SSE2. · 470ab00c
  Chris Lattner authored Apr 20, 2008
```
llvm-svn: 49986
```
  470ab00c
Apr 16, 2008
- Add support for the form of the SSE41 extractps instruction that · d43d3bee
  Dan Gohman authored Apr 16, 2008
```
puts its result in a 32-bit GPR.

llvm-svn: 49762
```
  d43d3bee
Apr 10, 2008

Fix the x86-64 side of PR2108 by adding a v2f64 version of · ad753024

Chris Lattner authored Apr 10, 2008

MOVZQI2PQIrr.  This would be better handled as a dag combine 
(with the goal of eliminating the bitconvert) but I don't know
how to do that safely.  Thoughts welcome.

llvm-svn: 49463

ad753024

Apr 05, 2008
- Favors pshufd over shufps when shuffling elements from one vector. pshufd is faster than shufps. · f77b5ef3
  Evan Cheng authored Apr 05, 2008
```
llvm-svn: 49244
```
  f77b5ef3
Mar 26, 2008
- Fix some SSE4.1 instruction encoding bugs. · 29206360
  Evan Cheng authored Mar 26, 2008
```
llvm-svn: 48815
```
  29206360
Mar 24, 2008

- SSE4.1 extractfps extracts a f32 into a gr32 register. Very useful! Not. Fix... · 615488ab

Evan Cheng authored Mar 24, 2008

- SSE4.1 extractfps extracts a f32 into a gr32 register. Very useful! Not. Fix the instruction specification and teaches lowering code to use it only when the only use is a store instruction.

llvm-svn: 48746

615488ab

Mar 16, 2008
- Add a couple missing SSE4 instructions · 9030ecec
  Nate Begeman authored Mar 16, 2008
```
llvm-svn: 48430
```
  9030ecec
Mar 15, 2008

Replace all target specific implicit def instructions with a target... · 0e7b00d7

Evan Cheng authored Mar 15, 2008

Replace all target specific implicit def instructions with a target independent one: TargetInstrInfo::IMPLICIT_DEF.

llvm-svn: 48380

0e7b00d7

Mar 14, 2008
- Fix some 80 col violations. · 5be52a60
  Evan Cheng authored Mar 14, 2008
```
llvm-svn: 48361
```
  5be52a60
- Fix a number of encoding bugs. SSE 4.1 instructions MPSADBWrri, PINSRDrr, etc.... · 96bdbd6c
  Evan Cheng authored Mar 14, 2008
```
Fix a number of encoding bugs. SSE 4.1 instructions MPSADBWrri, PINSRDrr, etc. have 8-bits immediate field (ImmT == Imm8).

llvm-svn: 48360
```
  96bdbd6c
Mar 12, 2008

Clean up my own mess. · 99ee78ef

Evan Cheng authored Mar 12, 2008

X86 lowering normalize vector 0 to v4i32. However DAGCombine can fold (sub x, x) -> 0 after legalization. It can create a zero vector of a type that's not expected (e.g. v8i16). We don't want to disable the optimization since leaving a (sub x, x) is really bad. Add isel patterns for other types of vector 0 to ensure correctness. It's highly unlikely to happen other than in bugpoint reduced test cases.

llvm-svn: 48279

99ee78ef

Mar 08, 2008
- Implement x86 support for @llvm.prefetch. It corresponds to prefetcht{0|1|2}... · 95cf6615
  Evan Cheng authored Mar 08, 2008
```
Implement x86 support for @llvm.prefetch. It corresponds to prefetcht{0|1|2} and prefetchnta instructions.

llvm-svn: 48042
```
  95cf6615
Mar 05, 2008
- isTwoAddress = 1 -> Constraints. · 3ea44e4e
  Evan Cheng authored Mar 05, 2008
```
llvm-svn: 47941
```
  3ea44e4e
- PSLLWri etc. are two-address instructions. · 6ec7dc6b
  Evan Cheng authored Mar 05, 2008
```
llvm-svn: 47940
```
  6ec7dc6b
Feb 19, 2008

- When DAG combiner is folding a bit convert into a BUILD_VECTOR, it should... · 6200c225

Evan Cheng authored Feb 18, 2008

- When DAG combiner is folding a bit convert into a BUILD_VECTOR, it should check if it's essentially a SCALAR_TO_VECTOR. Avoid turning (v8i16) <10, u, u, u> to <10, 0, u, u, u, u, u, u>. Instead, simply convert it to a SCALAR_TO_VECTOR of the proper type.
- X86 now normalize SCALAR_TO_VECTOR to (BIT_CONVERT (v4i32 SCALAR_TO_VECTOR)). Get rid of X86ISD::S2VEC.

llvm-svn: 47290

6200c225

Feb 16, 2008
- llvm.memory.barrier, and impl for x86 and alpha · 9b254eed
  Andrew Lenharth authored Feb 16, 2008
```
llvm-svn: 47204
```
  9b254eed
Feb 12, 2008
- SSE4.1 64b integer insert/extract pattern support · 8ef50214
  Nate Begeman authored Feb 12, 2008
```
Move formats into the formats file

llvm-svn: 47035
```
  8ef50214
Feb 11, 2008
- Enable SSE4 codegen and pattern matching. · 2d77e8e4
  Nate Begeman authored Feb 11, 2008
```
Add some notes to the README.

llvm-svn: 46949
```
  2d77e8e4
Feb 10, 2008
- xmm0 variable blends · 3050f74a
  Nate Begeman authored Feb 10, 2008
```
llvm-svn: 46931
```
  3050f74a
- memopv16i8 had wrong alignment requirement, would have broken pabsb · 727c7634
  Nate Begeman authored Feb 09, 2008
```
pabs{b,w,d} are not two address
fix extract-to-mem sse4 ops
add sse4 vector sign extend nodes

llvm-svn: 46915
```
  727c7634
Feb 09, 2008
- Skeleton of insert and extract matching, more to come · 6715f755
  Nate Begeman authored Feb 09, 2008
```
llvm-svn: 46902
```
  6715f755
Feb 04, 2008
- The rest of the SSE4.1 intrinsic patterns that are obvious to me. Getting · e146c0e3
  Nate Begeman authored Feb 04, 2008
```
Evan's help with the rest.

llvm-svn: 46697
```
  e146c0e3
- Some more SSE 4.1 intrinsic patterns. · ccdfd4aa
  Nate Begeman authored Feb 04, 2008
```
llvm-svn: 46696
```
  ccdfd4aa
Feb 03, 2008
- SSE 4.1 Intrinsics and detection · e14fdfae
  Nate Begeman authored Feb 03, 2008
```
llvm-svn: 46681
```
  e14fdfae
Jan 24, 2008

Significantly simplify and improve handling of FP function results on x86-32. · a91f77ea

Chris Lattner authored Jan 24, 2008

This case returns the value in ST(0) and then has to convert it to an SSE
register.  This causes significant codegen ugliness in some cases.  For 
example in the trivial fp-stack-direct-ret.ll testcase we used to generate:

_bar:
	subl	$28, %esp
	call	L_foo$stub
	fstpl	16(%esp)
	movsd	16(%esp), %xmm0
	movsd	%xmm0, 8(%esp)
	fldl	8(%esp)
	addl	$28, %esp
	ret

because we move the result of foo() into an XMM register, then have to
move it back for the return of bar.

Instead of hacking ever-more special cases into the call result lowering code
we take a much simpler approach: on x86-32, fp return is modeled as always 
returning into an f80 register which is then truncated to f32 or f64 as needed.
Similarly for a result, we model it as an extension to f80 + return.

This exposes the truncate and extensions to the dag combiner, allowing target
independent code to hack on them, eliminating them in this case.  This gives 
us this code for the example above:

_bar:
	subl	$12, %esp
	call	L_foo$stub
	addl	$12, %esp
	ret

The nasty aspect of this is that these conversions are not legal, but we want
the second pass of dag combiner (post-legalize) to be able to hack on them.
To handle this, we lie to legalize and say they are legal, then custom expand
them on entry to the isel pass (PreprocessForFPConvert).  This is gross, but
less gross than the code it is replacing :)

This also allows us to generate better code in several other cases.  For 
example on fp-stack-ret-conv.ll, we now generate:

_test:
	subl	$12, %esp
	call	L_foo$stub
	fstps	8(%esp)
	movl	16(%esp), %eax
	cvtss2sd	8(%esp), %xmm0
	movsd	%xmm0, (%eax)
	addl	$12, %esp
	ret

where before we produced (incidentally, the old bad code is identical to what
gcc produces):

_test:
	subl	$12, %esp
	call	L_foo$stub
	fstpl	(%esp)
	cvtsd2ss	(%esp), %xmm0
	cvtss2sd	%xmm0, %xmm0
	movl	16(%esp), %eax
	movsd	%xmm0, (%eax)
	addl	$12, %esp
	ret

Note that we generate slightly worse code on pr1505b.ll due to a scheduling 
deficiency that is unrelated to this patch.

llvm-svn: 46307

a91f77ea

Jan 11, 2008
- add some missing flags. · f4b0c99d
  Chris Lattner authored Jan 11, 2008
```
llvm-svn: 45859
```
  f4b0c99d
Jan 10, 2008

Start inferring side effect information more aggressively, and fix many bugs in the · 317332fc

Chris Lattner authored Jan 10, 2008

x86 backend where instructions were not marked maystore/mayload, and perf issues where
instructions were not marked neverHasSideEffects.  It would be really nice if we could
write patterns for copy instructions.

I have audited all the x86 instructions down to MOVDQAmr.  The flags on others and on
other targets are probably not right in all cases, but no clients currently use this
info that are enabled by default.

llvm-svn: 45829

317332fc