Commits · 1644409b477a12f45448970a96493366652a5e0d · Roger Ferrer / llvm-epi-0.8

Jul 27, 2011

Explicitly cast narrowing conversions inside {}s that will become errors in · 6381c010
Jeffrey Yasskin authored Jul 27, 2011
```
C++0x.

llvm-svn: 136211
```
6381c010
Move some code around to open opportunity for more shuffle matching · f9324f4f
Bruno Cardoso Lopes authored Jul 27, 2011
```
llvm-svn: 136201
```
f9324f4f

The vpermilps and vpermilpd have different behaviour regarding the · 27a30a77

Bruno Cardoso Lopes authored Jul 27, 2011

usage of the shuffle bitmask. Both work in 128-bit lanes without
crossing, but in the former the mask of the high part is the same
used by the low part while in the later both lanes have independent
masks. Handle this properly and and add support for vpermilpd.

llvm-svn: 136200

27a30a77

Add a neat little two's complement hack for x86. · 124ac2b9

Benjamin Kramer authored Jul 26, 2011

On x86 we can't encode an immediate LHS of a sub directly. If the RHS comes from a XOR with a constant we can
fold the negation into the xor and add one to the immediate of the sub. Then we can turn the sub into an add,
which can be commuted and encoded efficiently.

This code is generated for __builtin_clz and friends.

llvm-svn: 136167

124ac2b9

Recognize unpckh* masks and match 256-bit versions. The new versions are · f8fe47bd
Bruno Cardoso Lopes authored Jul 26, 2011
```
different from the previous 128-bit because they work in lanes.
Update a few comments and add testcases

llvm-svn: 136157
```
f8fe47bd

Jul 26, 2011
- Prevent x86-specific DAGCombine from creating nodes with illegal type (which... · 93dc04d5
  Eli Friedman authored Jul 26, 2011
```
Prevent x86-specific DAGCombine from creating nodes with illegal type (which could not be selected).  Fixes a minor isel issue that was breaking the testcase from r136130.

llvm-svn: 136148
```
  93dc04d5
- More movsldup/movshdup cleanup. Rewrite the mask matching function and add · d77b3831
  Bruno Cardoso Lopes authored Jul 26, 2011
```
support for 256-bit versions (but no instruction selection yet, coming next).

llvm-svn: 136050
```
  d77b3831
- More cleanup, subtarget info isn't used here. · 5b268a4b
  Bruno Cardoso Lopes authored Jul 26, 2011
```
llvm-svn: 136049
```
  5b268a4b
- Codegen allonesvector better while using AVX: vpcmpeqd + vinsertf128 · 9212bf27
  Bruno Cardoso Lopes authored Jul 25, 2011
```
This also fixes PR10452

llvm-svn: 136004
```
  9212bf27
- - Handle special scalar_to_vector case: splats. Using a native 128-bit · 123dff0f
  Bruno Cardoso Lopes authored Jul 25, 2011
```
shuffle before inserting on a 256-bit vector.
- Add AVX versions of movd/movq instructions
- Introduce a few COPY patterns to match insert_subvector instructions.
This turns a trivial insert_subvector instruction into a register copy,
coalescing the xmm into a ymm and avoid emiting on more instruction.

llvm-svn: 136002
```
  123dff0f
- Reintroduce r135730, this is indeed the right approach, there is no · 276eb8de
  Bruno Cardoso Lopes authored Jul 25, 2011
```
native 256-bit vector instruction to do scalar_to_vector.

llvm-svn: 136001
```
  276eb8de
Jul 25, 2011
- Get rid of an incorrect optimization for shuffles with PALIGNR and simplify isPALIGNRMask. · ea8c66fe
  Eli Friedman authored Jul 25, 2011
```
Addresses PR10466, although the crash from that PR only triggers in cases where DAGCombine misses optimizing a shuffle.

llvm-svn: 135980
```
  ea8c66fe
Jul 22, 2011

Turn shuffles into unpacks for VT == MVT::v2i64 and MVT::v2f64 · 77242dd5
Rafael Espindola authored Jul 22, 2011
```
too. Patch by Jeff Muizelaar.

llvm-svn: 135789
```
77242dd5

Fix x86's XALUO lowering to return its replacement values instead · c535278c

Dan Gohman authored Jul 22, 2011

of doing the RAUW calls for the overflow value itself. This makes
it more consistent with how the rest of LegalizeDAG works.

llvm-svn: 135788

c535278c

GCC complains about the angle of this line. · 959b7e9d
Benjamin Kramer authored Jul 22, 2011
```
Remove the escaped newline.

llvm-svn: 135739
```
959b7e9d

Remove the 128-bit special handling from SCALAR_TO_VECTOR. This isn't · 18721738

Bruno Cardoso Lopes authored Jul 22, 2011

the way to go. Doing this here will prevent several node matches later,
and would have to force looking all the way through several
VINSERTF128/VEXTRACTF128 chains to optimize simple things.

llvm-svn: 135730

18721738

-Inspected a AVX code block added by someone in early Feb. This was never used · 612e5617

Bruno Cardoso Lopes authored Jul 22, 2011

and was actually very wrong, fix it and make it simpler. Also remove the
ConcatVectors function, which is unused now.

- Fix a introduction of useless nodes in r126664 and r126264. The
VUNPCKL* should never be introduced cause we don't want duplicate
nodes for 128 AVX and non-AVX modes, the actual instruction
difference only exists during isel, but not for target specific DAG
nodes. We only introduce V* target nodes when there is no 128-bit
version already there.

- Fix a fragile test and make it more useful.

llvm-svn: 135729

612e5617

Add a DAGCombine for transforming 128->256 casts into a simple · 91eff514
Bruno Cardoso Lopes authored Jul 22, 2011
```
vxorps + vinsertf128 pair of instructions

llvm-svn: 135727
```
91eff514
Introduce a new function to lower 256-bit vectors which are not · dbebd012
Bruno Cardoso Lopes authored Jul 22, 2011
```
direclty supported and should be promoted and handled by smaller
shuffles

llvm-svn: 135726
```
dbebd012
Rename function to be more specific and be more strict about its usage · 95d03772
Bruno Cardoso Lopes authored Jul 22, 2011
```
llvm-svn: 135725
```
95d03772

Jul 21, 2011

- Register v16i16 as valid VR256 register class · 178fb406

Bruno Cardoso Lopes authored Jul 21, 2011

- Add more bitcasts for v16i16
- Since 135661 and 135662 already added the splat logic,
just add one more splat test for v16i16

llvm-svn: 135663

178fb406

Add support for 256-bit versions of VPERMIL instruction. This is a new · b878caa5

Bruno Cardoso Lopes authored Jul 21, 2011

instruction introduced in AVX, which can operate on 128 and 256-bit vectors.
It considers a 256-bit vector as two independent 128-bit lanes. It can permute
any 32 or 64 elements inside a lane, and restricts the second lane to
have the same permutation of the first one. With the improved splat support
introduced early today, adding codegen for this instruction enable more
efficient 256-bit code:

Instead of:
  vextractf128  $0, %ymm0, %xmm0
  punpcklbw %xmm0, %xmm0
  punpckhbw %xmm0, %xmm0
  vinsertf128 $0, %xmm0, %ymm0, %ymm1
  vinsertf128 $1, %xmm0, %ymm1, %ymm0
  vextractf128  $1, %ymm0, %xmm1
  shufps  $1, %xmm1, %xmm1
  movss %xmm1, 28(%rsp)
  movss %xmm1, 24(%rsp)
  movss %xmm1, 20(%rsp)
  movss %xmm1, 16(%rsp)
  vextractf128  $0, %ymm0, %xmm0
  shufps  $1, %xmm0, %xmm0
  movss %xmm0, 12(%rsp)
  movss %xmm0, 8(%rsp)
  movss %xmm0, 4(%rsp)
  movss %xmm0, (%rsp)
  vmovaps (%rsp), %ymm0
We get:
  vextractf128  $0, %ymm0, %xmm0
  punpcklbw %xmm0, %xmm0
  punpckhbw %xmm0, %xmm0
  vinsertf128 $0, %xmm0, %ymm0, %ymm1
  vinsertf128 $1, %xmm0, %ymm1, %ymm0
  vpermilps $85, %ymm0, %ymm0

llvm-svn: 135662

b878caa5

Improve splat promotion to handle AVX types: v32i8 and v16i16. Also · fb4920eb

Bruno Cardoso Lopes authored Jul 21, 2011

refactor the code and add a bunch of comments. The final shuffle
emitted by handling 256-bit types is suitable for the VPERM shuffle
instruction which is going to be introduced in a next commit (with
a testcase which cover this commit)

llvm-svn: 135661

fb4920eb

Tidy up code · 0bdeacf0
Bruno Cardoso Lopes authored Jul 21, 2011
```
llvm-svn: 135656
```
0bdeacf0

Jul 20, 2011

Goodbye TargetAsmInfo. This eliminate last bit of CodeGen and Target in llvm-mc. · bbf3b0de

Evan Cheng authored Jul 20, 2011

There is still a bit more refactoring left to do in Targets. But we are now very
close to fixing all the layering issues in MC.

llvm-svn: 135611

bbf3b0de

Jul 18, 2011
- Sink getDwarfRegNum, getLLVMRegNum, getSEHRegNum from TargetRegisterInfo down · d60fa58b
  Evan Cheng authored Jul 18, 2011
```
to MCRegisterInfo. Also initialize the mapping at construction time.

This patch eliminate TargetRegisterInfo from TargetAsmInfo. It's another step
towards fixing the layering violation.

llvm-svn: 135424
```
  d60fa58b
- land David Blaikie's patch to de-constify Type, with a few tweaks. · 229907cd
  Chris Lattner authored Jul 18, 2011
```
llvm-svn: 135375
```
  229907cd
Jul 16, 2011

Fix a couple of things: · 8df9cfc2

Bruno Cardoso Lopes authored Jul 15, 2011

1) Make non-legal 256-bit loads to be promoted to v4i64. This lets us
canonize the loads and handle things the same way we use to handle
for 128-bit registers. Despite of what one of the removed comments
explained, the load promotion would not mess with VPERM, it's only a
matter of doing the appropriate bitcasts when this instructions comes
to be introduced. Also make LOAD v8i32 legal.

2) Doing 1) exposed two bugs:
- v4i64 was being promoted to itself for several opcodes (introduced
in r124447 by David Greene) causing endless recursion and the stack to
explode.
- there was no support for allOnes BUILD_VECTORs and ANDNP would fail to
match because it was generating early target constant pools during
lowering.

3) The testcases are already checked-in, doing 1) exposed the
bugs in the current testcases.

4) Tidy up code to be more clear and explicit about AVX.

llvm-svn: 135313

8df9cfc2

Jul 14, 2011

Check register class matching instead of width of type matching · 92464be2

Eric Christopher authored Jul 14, 2011

when determining validity of matching constraint. Allow i1
types access to the GR8 reg class for x86.

Fixes PR10352 and rdar://9777108

llvm-svn: 135180

92464be2

· 771f2967

Nadav Rotem authored Jul 14, 2011

[VECTOR-SELECT]
During type legalization we often use the SIGN_EXTEND_INREG SDNode.
When this SDNode is legalized during the LegalizeVector phase, it is
scalarized because non-simple types are automatically marked to be expanded.
In this patch we add support for lowering SIGN_EXTEND_INREG manually.
This fixes CodeGen/X86/vec_sext.ll when running with the '-promote-elements'
flag.

llvm-svn: 135144

771f2967

Jul 13, 2011
- Make X86ISD::ANDNP more general and Codegen 256-bit VANDNP. A more · 9613b649
  Bruno Cardoso Lopes authored Jul 13, 2011
```
general version of X86ISD::ANDNP also opened the room for a little bit
of refactoring.

llvm-svn: 135088
```
  9613b649
- The target specific node PANDN name is misleading. That happens because · 7ba479d2
  Bruno Cardoso Lopes authored Jul 13, 2011
```
it's later selected to a ANDNPD/ANDNPS instruction instead of the PANDN
instruction. Rename it.

llvm-svn: 135087
```
  7ba479d2
Jul 08, 2011
- Add _allrem, _aullrem and _allmul to the runtime for MSVC. · 112fcc16
  Julien Lerouge authored Jul 08, 2011
```
http://llvm.org/bugs/show_bug.cgi?id=10305

llvm-svn: 134744
```
  112fcc16
- Add an intrinsic and codegen support for fused multiply-accumulate. The intent · f03fa189
  Cameron Zwarich authored Jul 08, 2011
```
is to use this for architectures that have a native FMA instruction.

llvm-svn: 134742
```
  f03fa189
- Let the inline asm 'q' constraint match float, and on 64-bit double too. · 9badf602
  Nick Lewycky authored Jul 08, 2011
```
Fixes PR9602!

llvm-svn: 134665
```
  9badf602
- Go ahead and emit the barrier on x86-64 even without sse2. The · 7a2a0f80
  Eric Christopher authored Jul 08, 2011
```
processor supports it just fine.

Fixes PR9675 and rdar://9740801

llvm-svn: 134664
```
  7a2a0f80
- Add support for the X86 'l' constraint. · 9721396d
  Eric Christopher authored Jul 07, 2011
```
Fixes PR10149 and rdar://9738585

llvm-svn: 134648
```
  9721396d
Jun 29, 2011
- Use getRegForInlineAsmConstraint instead of custom defining regclasses · 7e5f2350
  Eric Christopher authored Jun 29, 2011
```
via vectors.

Part of rdar://9643582

llvm-svn: 134079
```
  7e5f2350
Jun 28, 2011

Clean up the handling of the x87 fp stack to make it more robust. · 7297e7e2

Jakob Stoklund Olesen authored Jun 28, 2011

Drop the FpMov instructions, use plain COPY instead.

Drop the FpSET/GET instruction for accessing fixed stack positions.
Instead use normal COPY to/from ST registers around inline assembly, and
provide a single new FpPOP_RETVAL instruction that can access the return
value(s) from a call. This is still necessary since you cannot tell from
the CALL instruction alone if it returns anything on the FP stack. Teach
fast isel to use this.

This provides a much more robust way of handling fixed stack registers -
we can tolerate arbitrary FP stack instructions inserted around calls
and inline assembly. Live range splitting could sometimes break x87 code
by inserting spill code in unfortunate places.

As a bonus we handle floating point inline assembly correctly now.

llvm-svn: 134018

7297e7e2

Jun 25, 2011
- Replace dyn_cast<> with cast<> since the cast is already guarded by the necessary check. · 15db390f
  Chad Rosier authored Jun 25, 2011
```
llvm-svn: 133874
```
  15db390f