Commits · 60dbc63a891d59b1553caca5acdba99c876e97bf · Roger Ferrer / llvm-epi-0.8

Aug 22, 2011
- Add support for breaking 256-bit int VETCC into two 128-bit ones, · 74f090d4
  Bruno Cardoso Lopes authored Aug 22, 2011
```
avoding scalarization of the compare. Reduces code from 59 to 6
instructions. Fix PR10712.

llvm-svn: 138271
```
  74f090d4
- Add 128-bit AVX codegen for PCMP* family of integer instructions · 6e62ca94
  Bruno Cardoso Lopes authored Aug 22, 2011
```
llvm-svn: 138270
```
  6e62ca94
Aug 20, 2011
- Re-write part of VEX encoding logic, to be more easy to read! Also fix · d126347f
  Bruno Cardoso Lopes authored Aug 19, 2011
```
a bug and add a testcase!

llvm-svn: 138123
```
  d126347f
Aug 19, 2011

Add TB encoding to VEX versions of SSE fp logical operations to fix disassembler · ba6c2a52
Craig Topper authored Aug 19, 2011
```
llvm-svn: 138034
```
ba6c2a52
Fix PR10677. Initial patch and idea by Peter Cooper but I've changed the · 22241acc
Bruno Cardoso Lopes authored Aug 19, 2011
```
implementation!

llvm-svn: 138029
```
22241acc

Re-encoded 128-bit AVX versions of SQRT, RSQRT, RCP have 3 operands · 5647d84a

Bruno Cardoso Lopes authored Aug 18, 2011

instead of 2. They were already defined this way in their regular
version, but not for the intrinsics versions (*_Int), and that would work
for assembly emission but not for object code, since a MachineOperand
would be missing. This commit fix PR10697.

Also removed the {VSQRT,VRSQRT,VRCP}r_Int forms and match the intrinsic
via INSERT_SUBREG+EXTRACT_SUBREG patterns. The same couldn't be done for
memory versions because sse_load_f32/sse_load_f64 operand need special
handling and don't work like regular "addr" operands.

There are right now 114 "*_Int" and 98 "Int_*" forms! I'm slowly
removing them as I step through, but hope we can get rid of these
someday, they are really annoying :)

llvm-svn: 138012

5647d84a

Aug 18, 2011
- Cleanup vector logical ops in AVX and add use int versions for simple · 3c7d6eb6
  Bruno Cardoso Lopes authored Aug 18, 2011
```
v2i64

llvm-svn: 137919
```
  3c7d6eb6
- Fix PR10688. Add support for spliting 256-bit vector shifts when the · 1a87fcb9
  Bruno Cardoso Lopes authored Aug 17, 2011
```
shift amount is variable

llvm-svn: 137885
```
  1a87fcb9
Aug 17, 2011

Allow the MCDisassembler to return a "soft fail" status code, indicating an... · a4043c4b

Owen Anderson authored Aug 17, 2011

Allow the MCDisassembler to return a "soft fail" status code, indicating an instruction that is disassemblable, but invalid. Only used for ARM UNPREDICTABLE instructions at the moment.
Patch by James Molloy.

llvm-svn: 137830

a4043c4b

Introduce matching patterns for vbroadcast AVX instruction. The idea is to · be5e9873

Bruno Cardoso Lopes authored Aug 17, 2011

match splats in the form (splat (scalar_to_vector (load ...))) whenever
the load can be folded. All the logic and instruction emission is
working but because of PR8156, there are no ways to match loads, cause
they can never be folded for splats. Thus, the tests are XFAILed, but
I've tested and exercised all the logic using a relaxed version for
checking the foldable loads, as if the bug was already fixed. This
should work out of the box once PR8156 gets fixed since MayFoldLoad will
work as expected.

llvm-svn: 137810

be5e9873

Update comments about vector splat handling in x86 · 6d33c7f3
Bruno Cardoso Lopes authored Aug 17, 2011
```
llvm-svn: 137808
```
6d33c7f3

Now that we have a canonical way to handle 256-bit splats: · ed786a34

Bruno Cardoso Lopes authored Aug 17, 2011

vinsertf128 $1 + vpermilps $0, remove the old code that used to first
do the splat in a 128-bit vector and then insert it into a larger one.
This is better because the handling code gets simpler and also makes a
better room for the upcoming vbroadcast!

llvm-svn: 137807

ed786a34

Aug 16, 2011

Instead of always leaving the work to the generic legalizer when · 2e99f1b3

Bruno Cardoso Lopes authored Aug 16, 2011

there is no support for native 256-bit shuffles, be more smart in some
cases, for example, when you can extract specific 128-bit parts and use
regular 128-bit shuffles for them. Example:

For this shuffle:
  shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32>
                <i32 1, i32 0, i32 7, i32 6>

This was expanded to:
  vextractf128  $1, %ymm1, %xmm2
  vpextrq $0, %xmm2, %rax
  vmovd %rax, %xmm1
  vpextrq $1, %xmm2, %rax
  vmovd %rax, %xmm2
  vpunpcklqdq %xmm1, %xmm2, %xmm1
  vpextrq $0, %xmm0, %rax
  vmovd %rax, %xmm2
  vpextrq $1, %xmm0, %rax
  vmovd %rax, %xmm0
  vpunpcklqdq %xmm2, %xmm0, %xmm0
  vinsertf128 $1, %xmm1, %ymm0, %ymm0
  ret

Now we get:
  vshufpd $1, %xmm0, %xmm0, %xmm0
  vextractf128  $1, %ymm1, %xmm1
  vshufpd $1, %xmm1, %xmm1, %xmm1
  vinsertf128 $1, %xmm1, %ymm0, %ymm0

llvm-svn: 137733

2e99f1b3

While I'm here, remove the "_alt" hacks to a series of INSERT_SUBREG and · c1676e41
Bruno Cardoso Lopes authored Aug 15, 2011
```
also add the AVX versions of the 128-bit patterns

llvm-svn: 137685
```
c1676e41
Reorder declarations of vmovmskp* and also put the necessary AVX · 67005029
Bruno Cardoso Lopes authored Aug 15, 2011
```
predicate and TB encoding fields. This fix the encoding for the
attached testcase. This fixes PR10625.

llvm-svn: 137684
```
67005029

MCTargetAsmParser target match predicate support. · 120a96a7

Jim Grosbach authored Aug 15, 2011

Allow a target assembly parser to do context sensitive constraint checking
on a potential instruction match. This will be used, for example, to handle
Thumb2 IT block parsing.

llvm-svn: 137675

120a96a7

Aug 15, 2011
- Fix PR10656. It's only profitable to use 128-bit inserts and extracts · cbe7feea
  Bruno Cardoso Lopes authored Aug 15, 2011
```
when AVX mode is one. Otherwise is just more work for the type
legalizer.

llvm-svn: 137661
```
  cbe7feea
Aug 12, 2011
- Fix comment! · c53dd2ac
  Bruno Cardoso Lopes authored Aug 12, 2011
```
llvm-svn: 137521
```
  c53dd2ac
- The VPERM2F128 is a AVX instruction which permutes between two 256-bit · f15dfe58
  Bruno Cardoso Lopes authored Aug 12, 2011
```
vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.

llvm-svn: 137519
```
  f15dfe58
- Move code around and add comments · 960c8f71
  Bruno Cardoso Lopes authored Aug 12, 2011
```
llvm-svn: 137518
```
  960c8f71
- Silence a bunch (but not all) "variable written but not read" warnings · a41634e3
  Duncan Sands authored Aug 12, 2011
```
when building with assertions disabled.

llvm-svn: 137460
```
  a41634e3
- findDeadCallerSavedReg fix: Missing NULL terminator in register arrays. · 210bf835
  Andrew Trick authored Aug 12, 2011
```
Fix by Ivan Baev. Sorry I don't have a unit test, but the fix is obvious so I don't want to delay it.

llvm-svn: 137404
```
  210bf835
Aug 11, 2011

Add a dag combine to xform 256-bit shuffles into simple vector · 8fbf023c

Bruno Cardoso Lopes authored Aug 11, 2011

inserts and extracts. This simple combine makes us generate only 1
instruction instead of 11 in the v8 case.

llvm-svn: 137362

8fbf023c

Fix PR10492 by teaching MOVHLPS and MOVLPS mask matching to be more strict. · 043c8208
Bruno Cardoso Lopes authored Aug 11, 2011
```
llvm-svn: 137324
```
043c8208
Add a comment, per Bruno's CR. · efdd183f
Nadav Rotem authored Aug 11, 2011
```
llvm-svn: 137313
```
efdd183f

[AVX] If the data which is going to be saved is already in two XMM registers · 1542d5a0

Nadav Rotem authored Aug 11, 2011

(for example, after integer operation), do not pack the registers into a YMM
before saving. Its better to save as two XMM registers.

Before:
                vinsertf128         $1, %xmm3, %ymm0, %ymm3
                vinsertf128         $0, %xmm1, %ymm3, %ymm1
                vmovaps              %ymm1, 416(%rsp)

After:
                vmovaps              %xmm3, 416+16(%rsp)
                vmovaps              %xmm1, 416(%rsp)

llvm-svn: 137308

1542d5a0

Cleanup: Remove Int_ CVTSS2SI* forms · dbd1352c
Bruno Cardoso Lopes authored Aug 11, 2011
```
llvm-svn: 137297
```
dbd1352c
Splats for v8i32/v8f32 can be handled by VPERMILPSY. This was causing · a2d8bb97
Bruno Cardoso Lopes authored Aug 11, 2011
```
infinite recursive calls in legalize. Fix PR10562

llvm-svn: 137296
```
a2d8bb97
Use the splat index to generate the desired shuffle. Otherwise we · 572c9aaf
Bruno Cardoso Lopes authored Aug 11, 2011
```
could only get undefs and the vector shuffle becomes an undef,
generating wrong code.

llvm-svn: 137295
```
572c9aaf

Fix X86TargetLowering::LowerExternalSymbol so that it actually works in... · 3ae39f8a

Eli Friedman authored Aug 11, 2011

Fix X86TargetLowering::LowerExternalSymbol so that it actually works in non-trivial cases. This hasn't been an issue before because the function isn't normally called (but apparently is used to generate a tail-call to sin() on ELF x86-32 with PIC and SSE2).

Fixes PR9693.

llvm-svn: 137292

3ae39f8a

Aug 10, 2011
- When performing a truncating store, it is sometimes possible to rearrange the · 410a11fe
  Nadav Rotem authored Aug 10, 2011
```
data in-register prior to saving to memory.  When we reorder the data in memory
we prevent the need to save multiple scalars to memory, making a single regular
store.

llvm-svn: 137238
```
  410a11fe
- The following X86 pattern is incorrect: · 3ff111c1
  Bruno Cardoso Lopes authored Aug 10, 2011
```
def : Pat<(X86Movss VR128:$src1,
                   (bc_v4i32 (v2i64 (load addr:$src2)))),
          (MOVLPSrm VR128:$src1, addr:$src2)>;
This matches a MOVSS dag with a MOVLPS instruction. However, MOVSS will replace only the low 32 bits of the register, while the MOVLPS instruction will replace the low 64 bits. A testcase is added and illustrates the bug and also modified the one that was already present. Patch by Tanya Lattner.

llvm-svn: 137227
```
  3ff111c1
- Fix a bug in vpermilps mask checking. Fix PR10560 · 278ffd7d
  Bruno Cardoso Lopes authored Aug 10, 2011
```
llvm-svn: 137194
```
  278ffd7d
- Add 256-bit support for v8i32, v4i64 and v4f64 ISD::SELECT. Fix PR10556 · 72323966
  Bruno Cardoso Lopes authored Aug 09, 2011
```
llvm-svn: 137179
```
  72323966
- Add v16i16 and v32i8 store patterns · fc481959
  Bruno Cardoso Lopes authored Aug 09, 2011
```
llvm-svn: 137166
```
  fc481959
- Use fp unpack instructions to unpack int types. Until we have AVX2, this · 6963062a
  Bruno Cardoso Lopes authored Aug 09, 2011
```
is the best we can do for these patterns. This fix PR10554.

llvm-svn: 137161
```
  6963062a
- Fix a couple ridiculous copy-paste errors. rdar://9914773 . · 4ef2426b
  Eli Friedman authored Aug 09, 2011
```
llvm-svn: 137160
```
  4ef2426b
Aug 09, 2011
- Reapply a more appropriate solution than in r137114. AVX supports · bed48dc8
  Bruno Cardoso Lopes authored Aug 09, 2011
```
v4f64 = sitofp v4i32. This fix PR10559.
Also add support for v4i32 = fptosi v4f64.

llvm-svn: 137128
```
  bed48dc8
- Revert r137114 · 24dd1d4a
  Bruno Cardoso Lopes authored Aug 09, 2011
```
llvm-svn: 137127
```
  24dd1d4a
- Handle sitofp between v4f64 <- v4i32. Fix PR10559 · ad3453cf
  Bruno Cardoso Lopes authored Aug 09, 2011
```
llvm-svn: 137114
```
  ad3453cf