Commits · 81eb193f2ec80fa46c9caa7bc9f4164e9f580256 · Roger Ferrer / llvm-epi-0.8

Jul 29, 2011
- Match VPERMIL masks more strictly and update the target specific mask · 81eb193f
  Bruno Cardoso Lopes authored Jul 29, 2011
```
generation to always catch the weird cases.

llvm-svn: 136453
```
  81eb193f
- Add v8i32 and v4i64 vpermil patterns · d23709b1
  Bruno Cardoso Lopes authored Jul 29, 2011
```
llvm-svn: 136451
```
  d23709b1
Jul 28, 2011
- Add patterns to generate copies for extract_subvector instead of · 76bc28ba
  Bruno Cardoso Lopes authored Jul 28, 2011
```
using vextractf128. This will reduce the number of issued instruction
for several avx codes.

llvm-svn: 136323
```
  76bc28ba
- Add a few patterns to match allzeros without having to use the fp unit. · eca99c4b
  Bruno Cardoso Lopes authored Jul 28, 2011
```
Take advantage that the 128-bit vpxor zeros the higher part and use it.
This also fixes PR10491

llvm-svn: 136321
```
  eca99c4b
- Add SINT_TO_FP and FP_TO_SINT support for v8i32 types. Also move · 9e2a3012
  Bruno Cardoso Lopes authored Jul 28, 2011
```
a convert pattern close to the instruction definition.

llvm-svn: 136320
```
  9e2a3012
Jul 27, 2011

The vpermilps and vpermilpd have different behaviour regarding the · 27a30a77

Bruno Cardoso Lopes authored Jul 27, 2011

usage of the shuffle bitmask. Both work in 128-bit lanes without
crossing, but in the former the mask of the high part is the same
used by the low part while in the later both lanes have independent
masks. Handle this properly and and add support for vpermilpd.

llvm-svn: 136200

27a30a77

It is quiet possible that inlined function body is split into multiple chunks... · f098ce27

Devang Patel authored Jul 27, 2011

It is quiet possible that inlined function body is split into multiple chunks of consequtive instructions. But, there is not any way to describe this in .debug_inline accelerator table used by gdb. However, describe non contiguous ranges of inlined function body appropriately using AT_range of DW_TAG_inlined_subroutine debug info entry.

llvm-svn: 136196

f098ce27

Eliminate copies of undefined values during coalescing. · c3bcb021

Jakob Stoklund Olesen authored Jul 26, 2011

These copies would coalesce easily, but the resulting value would be
defined by a deleted instruction. Now we also remove the undefined value
number from the destination register.

This fixes PR10503.

llvm-svn: 136174

c3bcb021

Update test. · a79c1e05
Benjamin Kramer authored Jul 26, 2011
```
llvm-svn: 136170
```
a79c1e05

Add a neat little two's complement hack for x86. · 124ac2b9

Benjamin Kramer authored Jul 26, 2011

On x86 we can't encode an immediate LHS of a sub directly. If the RHS comes from a XOR with a constant we can
fold the negation into the xor and add one to the immediate of the sub. Then we can turn the sub into an add,
which can be commuted and encoded efficiently.

This code is generated for __builtin_clz and friends.

llvm-svn: 136167

124ac2b9

Recognize unpckh* masks and match 256-bit versions. The new versions are · f8fe47bd
Bruno Cardoso Lopes authored Jul 26, 2011
```
different from the previous 128-bit because they work in lanes.
Update a few comments and add testcases

llvm-svn: 136157
```
f8fe47bd

Jul 26, 2011
- Prevent x86-specific DAGCombine from creating nodes with illegal type (which... · 93dc04d5
  Eli Friedman authored Jul 26, 2011
```
Prevent x86-specific DAGCombine from creating nodes with illegal type (which could not be selected).  Fixes a minor isel issue that was breaking the testcase from r136130.

llvm-svn: 136148
```
  93dc04d5
- XFAIL this test while I investigate it; it's failing for an unexpected reason. · 74743041
  Eli Friedman authored Jul 26, 2011
```
llvm-svn: 136131
```
  74743041
- Add obvious missing case to switch. PR10497. · 06b8b571
  Eli Friedman authored Jul 26, 2011
```
llvm-svn: 136130
```
  06b8b571
- Add 256-bit isel for movsldup/movshdup · d600a0f8
  Bruno Cardoso Lopes authored Jul 26, 2011
```
llvm-svn: 136051
```
  d600a0f8
- Codegen allonesvector better while using AVX: vpcmpeqd + vinsertf128 · 9212bf27
  Bruno Cardoso Lopes authored Jul 25, 2011
```
This also fixes PR10452

llvm-svn: 136004
```
  9212bf27
- - Handle special scalar_to_vector case: splats. Using a native 128-bit · 123dff0f
  Bruno Cardoso Lopes authored Jul 25, 2011
```
shuffle before inserting on a 256-bit vector.
- Add AVX versions of movd/movq instructions
- Introduce a few COPY patterns to match insert_subvector instructions.
This turns a trivial insert_subvector instruction into a register copy,
coalescing the xmm into a ymm and avoid emiting on more instruction.

llvm-svn: 136002
```
  123dff0f
- Attempt to fix test failure reported on llvm-commits. · 442d1b19
  Eli Friedman authored Jul 25, 2011
```
llvm-svn: 135995
```
  442d1b19
- Make sure this DAGCombine actually returns an UNDEF of the correct type; PR10476. · cbd3ba91
  Eli Friedman authored Jul 25, 2011
```
llvm-svn: 135993
```
  cbd3ba91
Jul 25, 2011
- Get rid of an incorrect optimization for shuffles with PALIGNR and simplify isPALIGNRMask. · ea8c66fe
  Eli Friedman authored Jul 25, 2011
```
Addresses PR10466, although the crash from that PR only triggers in cases where DAGCombine misses optimizing a shuffle.

llvm-svn: 135980
```
  ea8c66fe
Jul 24, 2011

Correctly handle <undef> tied uses when rewriting after a split. · 56a56eb8

Jakob Stoklund Olesen authored Jul 24, 2011

This fixes PR10463. A two-address instruction with an <undef> use
operand was incorrectly rewritten so the def and use no longer used the
same register, violating the tie constraint.

Fix this by always rewriting <undef> operands with the register a def
operand would use.

llvm-svn: 135885

56a56eb8

Jul 22, 2011

Fix test check! · 7a207551
Bruno Cardoso Lopes authored Jul 22, 2011
```
llvm-svn: 135802
```
7a207551
Fix PR10422 by adding the necessary AVX UCOMISD memory versions to · a8903999
Bruno Cardoso Lopes authored Jul 22, 2011
```
load folding logic

llvm-svn: 135801
```
a8903999
Turn shuffles into unpacks for VT == MVT::v2i64 and MVT::v2f64 · 77242dd5
Rafael Espindola authored Jul 22, 2011
```
too. Patch by Jeff Muizelaar.

llvm-svn: 135789
```
77242dd5

-Inspected a AVX code block added by someone in early Feb. This was never used · 612e5617

Bruno Cardoso Lopes authored Jul 22, 2011

and was actually very wrong, fix it and make it simpler. Also remove the
ConcatVectors function, which is unused now.

- Fix a introduction of useless nodes in r126664 and r126264. The
VUNPCKL* should never be introduced cause we don't want duplicate
nodes for 128 AVX and non-AVX modes, the actual instruction
difference only exists during isel, but not for target specific DAG
nodes. We only introduce V* target nodes when there is no 128-bit
version already there.

- Fix a fragile test and make it more useful.

llvm-svn: 135729

612e5617

Although we already support this, add testcases for consistency · 14a95bda
Bruno Cardoso Lopes authored Jul 22, 2011
```
llvm-svn: 135728
```
14a95bda
Add a DAGCombine for transforming 128->256 casts into a simple · 91eff514
Bruno Cardoso Lopes authored Jul 22, 2011
```
vxorps + vinsertf128 pair of instructions

llvm-svn: 135727
```
91eff514

Jul 21, 2011

- Register v16i16 as valid VR256 register class · 178fb406

Bruno Cardoso Lopes authored Jul 21, 2011

- Add more bitcasts for v16i16
- Since 135661 and 135662 already added the splat logic,
just add one more splat test for v16i16

llvm-svn: 135663

178fb406

Add support for 256-bit versions of VPERMIL instruction. This is a new · b878caa5

Bruno Cardoso Lopes authored Jul 21, 2011

instruction introduced in AVX, which can operate on 128 and 256-bit vectors.
It considers a 256-bit vector as two independent 128-bit lanes. It can permute
any 32 or 64 elements inside a lane, and restricts the second lane to
have the same permutation of the first one. With the improved splat support
introduced early today, adding codegen for this instruction enable more
efficient 256-bit code:

Instead of:
  vextractf128  $0, %ymm0, %xmm0
  punpcklbw %xmm0, %xmm0
  punpckhbw %xmm0, %xmm0
  vinsertf128 $0, %xmm0, %ymm0, %ymm1
  vinsertf128 $1, %xmm0, %ymm1, %ymm0
  vextractf128  $1, %ymm0, %xmm1
  shufps  $1, %xmm1, %xmm1
  movss %xmm1, 28(%rsp)
  movss %xmm1, 24(%rsp)
  movss %xmm1, 20(%rsp)
  movss %xmm1, 16(%rsp)
  vextractf128  $0, %ymm0, %xmm0
  shufps  $1, %xmm0, %xmm0
  movss %xmm0, 12(%rsp)
  movss %xmm0, 8(%rsp)
  movss %xmm0, 4(%rsp)
  movss %xmm0, (%rsp)
  vmovaps (%rsp), %ymm0
We get:
  vextractf128  $0, %ymm0, %xmm0
  punpcklbw %xmm0, %xmm0
  punpckhbw %xmm0, %xmm0
  vinsertf128 $0, %xmm0, %ymm0, %ymm1
  vinsertf128 $1, %xmm0, %ymm1, %ymm0
  vpermilps $85, %ymm0, %ymm0

llvm-svn: 135662

b878caa5

Jul 20, 2011
- While emitting constant value, look through derived type and use underlying... · bcd50a10
  Devang Patel authored Jul 20, 2011
```
While emitting constant value, look through derived type and use underlying basic type to determine size and signness of the constant value.

llvm-svn: 135627
```
  bcd50a10
- PR10421: Fix a straightforward bug in the widening logic for CONCAT_VECTORS. · 6ed78322
  Eli Friedman authored Jul 20, 2011
```
llvm-svn: 135595
```
  6ed78322
- New pointer rotate test. · 60648578
  Eric Christopher authored Jul 20, 2011
```
llvm-svn: 135562
```
  60648578
- Fix an obvious typo that's preventing x86 (32-bit) from using .literal16. · ccf243d5
  Evan Cheng authored Jul 19, 2011
```
llvm-svn: 135535
```
  ccf243d5
Jul 19, 2011
- · 9ab3cac6
  Devang Patel authored Jul 19, 2011
```
Revert r135423.

llvm-svn: 135454
```
  9ab3cac6
Jul 18, 2011

· 4dc76f24

Devang Patel authored Jul 18, 2011

During bottom up fast-isel, instructions emitted to materalize registers are at top of basic block and do not have debug location. This may misguide debugger while entering the basic block and sometimes debugger provides semi useful view of current location to developer by picking up previous known location as current location. Assign a sensible location to the first instruction in a basic block, if it does not have one location derived from source file, so that debugger can provide meaningful user experience to developers in edge cases.
[take 2]

llvm-svn: 135423

4dc76f24

Add AVX 128-bit sqrt versions · 4208cace
Bruno Cardoso Lopes authored Jul 18, 2011
```
llvm-svn: 135404
```
4208cace
Delete empty unused file. · d8921f93
Nick Lewycky authored Jul 18, 2011
```
llvm-svn: 135379
```
d8921f93

Jul 16, 2011

Add AVX 128-bit patterns for sint_to_fp · 44800401
Bruno Cardoso Lopes authored Jul 16, 2011
```
llvm-svn: 135332
```
44800401

Fix a couple of things: · 8df9cfc2

Bruno Cardoso Lopes authored Jul 15, 2011

1) Make non-legal 256-bit loads to be promoted to v4i64. This lets us
canonize the loads and handle things the same way we use to handle
for 128-bit registers. Despite of what one of the removed comments
explained, the load promotion would not mess with VPERM, it's only a
matter of doing the appropriate bitcasts when this instructions comes
to be introduced. Also make LOAD v8i32 legal.

2) Doing 1) exposed two bugs:
- v4i64 was being promoted to itself for several opcodes (introduced
in r124447 by David Greene) causing endless recursion and the stack to
explode.
- there was no support for allOnes BUILD_VECTORs and ANDNP would fail to
match because it was generating early target constant pools during
lowering.

3) The testcases are already checked-in, doing 1) exposed the
bugs in the current testcases.

4) Tidy up code to be more clear and explicit about AVX.

llvm-svn: 135313

8df9cfc2

Jul 14, 2011

Check register class matching instead of width of type matching · 92464be2

Eric Christopher authored Jul 14, 2011

when determining validity of matching constraint. Allow i1
types access to the GR8 reg class for x86.

Fixes PR10352 and rdar://9777108

llvm-svn: 135180

92464be2