Commits · 81eb193f2ec80fa46c9caa7bc9f4164e9f580256 · Roger Ferrer / llvm-epi-0.8

Jul 29, 2011

Match VPERMIL masks more strictly and update the target specific mask · 81eb193f
Bruno Cardoso Lopes authored Jul 29, 2011
```
generation to always catch the weird cases.

llvm-svn: 136453
```
81eb193f
Add v8i32 and v4i64 vpermil patterns · d23709b1
Bruno Cardoso Lopes authored Jul 29, 2011
```
llvm-svn: 136451
```
d23709b1

Transfer implicit operands in NEONMoveFixPass. · b28ee411

Jakob Stoklund Olesen authored Jul 29, 2011

Later passes /are/ using this information when running the register
scavenger.

This fixes the second problem in PR10520.

llvm-svn: 136440

b28ee411

Add -verify-arm-pseudo-expand. · 9c3badce

Jakob Stoklund Olesen authored Jul 29, 2011

This hidden llc option runs the machine code verifier after expanding
ARM pseudo-instructions, but before if-conversion.

The machine code verifier is much better at pointing out liveness errors
that can trip up the register scavenger.

llvm-svn: 136439

9c3badce

Jul 28, 2011

Handle REG_SEQUENCE with implicitly defined operands. · b16081ce

Jakob Stoklund Olesen authored Jul 28, 2011

Code like that would only be produced by bugpoint, but we should still
handle it correctly.

When a register is defined by a REG_SEQUENCE of undefs, the register
itself is undef. Previously, we would create a register with uses but no
defs.

Fixes part of PR10520.

llvm-svn: 136401

b16081ce

Add patterns to generate copies for extract_subvector instead of · 76bc28ba
Bruno Cardoso Lopes authored Jul 28, 2011
```
using vextractf128. This will reduce the number of issued instruction
for several avx codes.

llvm-svn: 136323
```
76bc28ba
Add a few patterns to match allzeros without having to use the fp unit. · eca99c4b
Bruno Cardoso Lopes authored Jul 28, 2011
```
Take advantage that the 128-bit vpxor zeros the higher part and use it.
This also fixes PR10491

llvm-svn: 136321
```
eca99c4b
Add SINT_TO_FP and FP_TO_SINT support for v8i32 types. Also move · 9e2a3012
Bruno Cardoso Lopes authored Jul 28, 2011
```
a convert pattern close to the instruction definition.

llvm-svn: 136320
```
9e2a3012

Jul 27, 2011

The vpermilps and vpermilpd have different behaviour regarding the · 27a30a77

Bruno Cardoso Lopes authored Jul 27, 2011

usage of the shuffle bitmask. Both work in 128-bit lanes without
crossing, but in the former the mask of the high part is the same
used by the low part while in the later both lanes have independent
masks. Handle this properly and and add support for vpermilpd.

llvm-svn: 136200

27a30a77

It is quiet possible that inlined function body is split into multiple chunks... · f098ce27

Devang Patel authored Jul 27, 2011

It is quiet possible that inlined function body is split into multiple chunks of consequtive instructions. But, there is not any way to describe this in .debug_inline accelerator table used by gdb. However, describe non contiguous ranges of inlined function body appropriately using AT_range of DW_TAG_inlined_subroutine debug info entry.

llvm-svn: 136196

f098ce27

Eliminate copies of undefined values during coalescing. · c3bcb021

Jakob Stoklund Olesen authored Jul 26, 2011

These copies would coalesce easily, but the resulting value would be
defined by a deleted instruction. Now we also remove the undefined value
number from the destination register.

This fixes PR10503.

llvm-svn: 136174

c3bcb021

Update test. · a79c1e05
Benjamin Kramer authored Jul 26, 2011
```
llvm-svn: 136170
```
a79c1e05

Add a neat little two's complement hack for x86. · 124ac2b9

Benjamin Kramer authored Jul 26, 2011

On x86 we can't encode an immediate LHS of a sub directly. If the RHS comes from a XOR with a constant we can
fold the negation into the xor and add one to the immediate of the sub. Then we can turn the sub into an add,
which can be commuted and encoded efficiently.

This code is generated for __builtin_clz and friends.

llvm-svn: 136167

124ac2b9

Recognize unpckh* masks and match 256-bit versions. The new versions are · f8fe47bd
Bruno Cardoso Lopes authored Jul 26, 2011
```
different from the previous 128-bit because they work in lanes.
Update a few comments and add testcases

llvm-svn: 136157
```
f8fe47bd

Jul 26, 2011
- Prevent x86-specific DAGCombine from creating nodes with illegal type (which... · 93dc04d5
  Eli Friedman authored Jul 26, 2011
```
Prevent x86-specific DAGCombine from creating nodes with illegal type (which could not be selected).  Fixes a minor isel issue that was breaking the testcase from r136130.

llvm-svn: 136148
```
  93dc04d5
- FileCheck'ize test. · 73a8393a
  Jim Grosbach authored Jul 26, 2011
```
llvm-svn: 136135
```
  73a8393a
- XFAIL this test while I investigate it; it's failing for an unexpected reason. · 74743041
  Eli Friedman authored Jul 26, 2011
```
llvm-svn: 136131
```
  74743041
- Add obvious missing case to switch. PR10497. · 06b8b571
  Eli Friedman authored Jul 26, 2011
```
llvm-svn: 136130
```
  06b8b571
- Add 256-bit isel for movsldup/movshdup · d600a0f8
  Bruno Cardoso Lopes authored Jul 26, 2011
```
llvm-svn: 136051
```
  d600a0f8
- Codegen allonesvector better while using AVX: vpcmpeqd + vinsertf128 · 9212bf27
  Bruno Cardoso Lopes authored Jul 25, 2011
```
This also fixes PR10452

llvm-svn: 136004
```
  9212bf27
- - Handle special scalar_to_vector case: splats. Using a native 128-bit · 123dff0f
  Bruno Cardoso Lopes authored Jul 25, 2011
```
shuffle before inserting on a 256-bit vector.
- Add AVX versions of movd/movq instructions
- Introduce a few COPY patterns to match insert_subvector instructions.
This turns a trivial insert_subvector instruction into a register copy,
coalescing the xmm into a ymm and avoid emiting on more instruction.

llvm-svn: 136002
```
  123dff0f
- Attempt to fix test failure reported on llvm-commits. · 442d1b19
  Eli Friedman authored Jul 25, 2011
```
llvm-svn: 135995
```
  442d1b19
- Make sure this DAGCombine actually returns an UNDEF of the correct type; PR10476. · cbd3ba91
  Eli Friedman authored Jul 25, 2011
```
llvm-svn: 135993
```
  cbd3ba91
Jul 25, 2011
- Get rid of an incorrect optimization for shuffles with PALIGNR and simplify isPALIGNRMask. · ea8c66fe
  Eli Friedman authored Jul 25, 2011
```
Addresses PR10466, although the crash from that PR only triggers in cases where DAGCombine misses optimizing a shuffle.

llvm-svn: 135980
```
  ea8c66fe
Jul 24, 2011

Correctly handle <undef> tied uses when rewriting after a split. · 56a56eb8

Jakob Stoklund Olesen authored Jul 24, 2011

This fixes PR10463. A two-address instruction with an <undef> use
operand was incorrectly rewritten so the def and use no longer used the
same register, violating the tie constraint.

Fix this by always rewriting <undef> operands with the register a def
operand would use.

llvm-svn: 135885

56a56eb8

Jul 22, 2011

Fix test check! · 7a207551
Bruno Cardoso Lopes authored Jul 22, 2011
```
llvm-svn: 135802
```
7a207551
Fix PR10422 by adding the necessary AVX UCOMISD memory versions to · a8903999
Bruno Cardoso Lopes authored Jul 22, 2011
```
load folding logic

llvm-svn: 135801
```
a8903999
Turn shuffles into unpacks for VT == MVT::v2i64 and MVT::v2f64 · 77242dd5
Rafael Espindola authored Jul 22, 2011
```
too. Patch by Jeff Muizelaar.

llvm-svn: 135789
```
77242dd5

-Inspected a AVX code block added by someone in early Feb. This was never used · 612e5617

Bruno Cardoso Lopes authored Jul 22, 2011

and was actually very wrong, fix it and make it simpler. Also remove the
ConcatVectors function, which is unused now.

- Fix a introduction of useless nodes in r126664 and r126264. The
VUNPCKL* should never be introduced cause we don't want duplicate
nodes for 128 AVX and non-AVX modes, the actual instruction
difference only exists during isel, but not for target specific DAG
nodes. We only introduce V* target nodes when there is no 128-bit
version already there.

- Fix a fragile test and make it more useful.

llvm-svn: 135729

612e5617

Although we already support this, add testcases for consistency · 14a95bda
Bruno Cardoso Lopes authored Jul 22, 2011
```
llvm-svn: 135728
```
14a95bda
Add a DAGCombine for transforming 128->256 casts into a simple · 91eff514
Bruno Cardoso Lopes authored Jul 22, 2011
```
vxorps + vinsertf128 pair of instructions

llvm-svn: 135727
```
91eff514

Jul 21, 2011

- Register v16i16 as valid VR256 register class · 178fb406

Bruno Cardoso Lopes authored Jul 21, 2011

- Add more bitcasts for v16i16
- Since 135661 and 135662 already added the splat logic,
just add one more splat test for v16i16

llvm-svn: 135663

178fb406

Add support for 256-bit versions of VPERMIL instruction. This is a new · b878caa5

Bruno Cardoso Lopes authored Jul 21, 2011

instruction introduced in AVX, which can operate on 128 and 256-bit vectors.
It considers a 256-bit vector as two independent 128-bit lanes. It can permute
any 32 or 64 elements inside a lane, and restricts the second lane to
have the same permutation of the first one. With the improved splat support
introduced early today, adding codegen for this instruction enable more
efficient 256-bit code:

Instead of:
  vextractf128  $0, %ymm0, %xmm0
  punpcklbw %xmm0, %xmm0
  punpckhbw %xmm0, %xmm0
  vinsertf128 $0, %xmm0, %ymm0, %ymm1
  vinsertf128 $1, %xmm0, %ymm1, %ymm0
  vextractf128  $1, %ymm0, %xmm1
  shufps  $1, %xmm1, %xmm1
  movss %xmm1, 28(%rsp)
  movss %xmm1, 24(%rsp)
  movss %xmm1, 20(%rsp)
  movss %xmm1, 16(%rsp)
  vextractf128  $0, %ymm0, %xmm0
  shufps  $1, %xmm0, %xmm0
  movss %xmm0, 12(%rsp)
  movss %xmm0, 8(%rsp)
  movss %xmm0, 4(%rsp)
  movss %xmm0, (%rsp)
  vmovaps (%rsp), %ymm0
We get:
  vextractf128  $0, %ymm0, %xmm0
  punpcklbw %xmm0, %xmm0
  punpckhbw %xmm0, %xmm0
  vinsertf128 $0, %xmm0, %ymm0, %ymm1
  vinsertf128 $1, %xmm0, %ymm1, %ymm0
  vpermilps $85, %ymm0, %ymm0

llvm-svn: 135662

b878caa5

Jul 20, 2011
- While emitting constant value, look through derived type and use underlying... · bcd50a10
  Devang Patel authored Jul 20, 2011
```
While emitting constant value, look through derived type and use underlying basic type to determine size and signness of the constant value.

llvm-svn: 135627
```
  bcd50a10
- PR10421: Fix a straightforward bug in the widening logic for CONCAT_VECTORS. · 6ed78322
  Eli Friedman authored Jul 20, 2011
```
llvm-svn: 135595
```
  6ed78322
- Add MCObjectFileInfo and sink the MCSections initialization code from · 76792992
  Evan Cheng authored Jul 20, 2011
```
TargetLoweringObjectFileImpl down to MCObjectFileInfo.

TargetAsmInfo is done to one last method. It's *almost* gone!

llvm-svn: 135569
```
  76792992
- New pointer rotate test. · 60648578
  Eric Christopher authored Jul 20, 2011
```
llvm-svn: 135562
```
  60648578
- Lower memory barriers to sync instructions. · a4c09bce
  Akira Hatanaka authored Jul 19, 2011
```
llvm-svn: 135537
```
  a4c09bce
- Fix an obvious typo that's preventing x86 (32-bit) from using .literal16. · ccf243d5
  Evan Cheng authored Jul 19, 2011
```
llvm-svn: 135535
```
  ccf243d5
Jul 19, 2011
- Use the correct opcodes: SLLV/SRLV or AND must be used instead of SLL/SRL or · f3b29992
  Akira Hatanaka authored Jul 19, 2011
```
ANDi, when the instruction does not have any immediate operands.

llvm-svn: 135520
```
  f3b29992