Commits · 4e5eb727353a3f152094a8d05eb8ae7491ce23f7 · Roger Ferrer / llvm-epi

Aug 13, 2012
- Tidy up VSETCC lowering code a bit more by adding an llvm_unreachable and... · 4e5eb727
  Craig Topper authored Aug 13, 2012
```
Tidy up VSETCC lowering code a bit more by adding an llvm_unreachable and putting an a couple if conditions in a better order.

llvm-svn: 161746
```
  4e5eb727
- Refactor code a bit to share commonalities. No functional change intended. · 5145a0d9
  Craig Topper authored Aug 13, 2012
```
llvm-svn: 161745
```
  5145a0d9
- Fix an unused variable warning from r161742. · ff6e4d19
  Craig Topper authored Aug 13, 2012
```
llvm-svn: 161743
```
  ff6e4d19
- Remove the LowerMMXCONCAT_VECTORS function. It could never execute because... · a7aaa62d
  Craig Topper authored Aug 13, 2012
```
Remove the LowerMMXCONCAT_VECTORS function. It could never execute because there are no legal 64-bit vector types that could be used as inputs to a 128-bit concat_vectors. Remove a target specific SDNode and its patterns that become unused as a result.

llvm-svn: 161742
```
  a7aaa62d
Aug 12, 2012
- Remove call to setOperationAction for SETCC of v4f32. SETCC returns an integer type not an FP type. · 3d2b2713
  Craig Topper authored Aug 12, 2012
```
llvm-svn: 161738
```
  3d2b2713
- Remove unnecessary call to setOperationAction for SETCC of v2i64 under SSE42.... · 498228d0
  Craig Topper authored Aug 12, 2012
```
Remove unnecessary call to setOperationAction for SETCC of v2i64 under SSE42. It was already called for the same under SSE2.

llvm-svn: 161737
```
  498228d0
- Make replace many calls to getSizeInBits() with is128BitVector/is256BitVector · 10a8bf3b
  Craig Topper authored Aug 12, 2012
```
llvm-svn: 161734
```
  10a8bf3b
- Use MVT.isXBitVector instead of EVT.isXBitVector when setting up operation... · 03d27872
  Craig Topper authored Aug 12, 2012
```
Use MVT.isXBitVector instead of EVT.isXBitVector when setting up operation actions. Compiles to smaller code.

llvm-svn: 161733
```
  03d27872
- fix PR13577, an issue introduced by r161687 · e7e828fd
  Michael Liao authored Aug 11, 2012
```
- FCMOV only supports a subset of X86 conditions. Skip boolean
  simplification if X86 condition is not valid for FCMOV.
- add a minimal test case for PR13577.

llvm-svn: 161732
```
  e7e828fd
- Move setOperationAction for CONCAT_VECTORS for 256-bit vectors into loop since... · b5bcf58b
  Craig Topper authored Aug 11, 2012
```
Move setOperationAction for CONCAT_VECTORS for 256-bit vectors into loop since all 256-bit types are supported.

llvm-svn: 161730
```
  b5bcf58b
Aug 10, 2012

add X86-specific DAG optimization to simplify boolean test · 5248e991

Michael Liao authored Aug 10, 2012

- if a boolean test (X86ISD::CMP or X86ISD:SUB) checks a boolean value
  generated from X86ISD::SETCC, try to simplify the boolean value
  generation and checking by reusing the original EFLAGS with proper
  condition code
- add hooks to X86 specific SETCC/BRCOND/CMOV, the major 3 places
  consuming EFLAGS

part of patches fixing PR12312

llvm-svn: 161687

5248e991

remove tailing whitespaces and test commit · ea7d906b
Michael Liao authored Aug 10, 2012
```
llvm-svn: 161664
```
ea7d906b
Add some missing includes for the build against stdcxx. · aa2f801c
Joerg Sonnenberger authored Aug 10, 2012
```
llvm-svn: 161657
```
aa2f801c

Aug 08, 2012

X86: enable CSE between CMP and SUB · 1be131ba

Manman Ren authored Aug 08, 2012

We perform the following:
1> Use SUB instead of CMP for i8,i16,i32 and i64 in ISel lowering.
2> Modify MachineCSE to correctly handle implicit defs.
3> Convert SUB back to CMP if possible at peephole.

Removed pattern matching of (a>b) ? (a-b):0 and like, since they are handled
by peephole now.

rdar://11873276

llvm-svn: 161462

1be131ba

X86 cmp lowering is looking past truncate on the condition node. It should only · fbdd25c1
Evan Cheng authored Aug 07, 2012
```
do so when the high bits are known zero. This caused a subtle miscompilation.

rdar://12027825 

llvm-svn: 161451
```
fbdd25c1

Aug 06, 2012

Implement proper handling for pcmpistri/pcmpestri intrinsics. Requires custom... · ab47fe4e

Craig Topper authored Aug 06, 2012

Implement proper handling for pcmpistri/pcmpestri intrinsics. Requires custom handling in DAGISelToDAG due to limitations in TableGen's implicit def handling. Fixes PR11305.

llvm-svn: 161318

ab47fe4e

Aug 05, 2012
- Remove custom inserter for MWAIT. It doesn't do anything that couldn't be represented in a pattern. · 6d0408d3
  Craig Topper authored Aug 05, 2012
```
llvm-svn: 161306
```
  6d0408d3
- Use a COPY node instead of an explicit MOVA opcode in the custom insterter for... · 43ee9fae
  Craig Topper authored Aug 05, 2012
```
Use a COPY node instead of an explicit MOVA opcode in the custom insterter for pcmpestrm/pcmpistrm. Allows the register allocator to handle it better and prevent wasted identity moves.

llvm-svn: 161305
```
  43ee9fae
Aug 03, 2012

Fall back to selection DAG isel for calls to builtin functions. · 3e6fa462

Bob Wilson authored Aug 03, 2012

Fast isel doesn't currently have support for translating builtin function
calls to target instructions. For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization. Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel. <rdar://problem/12008746>

llvm-svn: 161232

3e6fa462

Aug 01, 2012
- Whitespace. · 24c19d20
  Chad Rosier authored Aug 01, 2012
```
llvm-svn: 161122
```
  24c19d20
- Added FMA functionality to X86 target. · 3cb3b004
  Elena Demikhovsky authored Aug 01, 2012
```
llvm-svn: 161110
```
  3cb3b004
Jul 25, 2012
- When a return struct pointer is passed in registers, the called has nothing · 11c38b96
  Rafael Espindola authored Jul 25, 2012
```
to pop.

llvm-svn: 160725
```
  11c38b96
Jul 23, 2012
- Fix a typo (the the => the) · 35521e23
  Sylvestre Ledru authored Jul 23, 2012
```
llvm-svn: 160621
```
  35521e23
Jul 17, 2012

Back out r160101 and instead implement a dag combine to recover from instcombine transformation. · e6a3b03e
Evan Cheng authored Jul 17, 2012
```
llvm-svn: 160387
```
e6a3b03e
Implement r160312 as target indepedenet dag combine. · 780f9b5f
Evan Cheng authored Jul 17, 2012
```
llvm-svn: 160354
```
780f9b5f

This is another case where instcombine demanded bits optimization created · f579beca

Evan Cheng authored Jul 17, 2012

large immediates. Add dag combine logic to recover in case the large
immediates doesn't fit in cmp immediate operand field.

int foo(unsigned long l) {
  return (l>> 47) == 1;
}

we produce

  %shr.mask = and i64 %l, -140737488355328
  %cmp = icmp eq i64 %shr.mask, 140737488355328
  %conv = zext i1 %cmp to i32
  ret i32 %conv

which codegens to

movq    $0xffff800000000000,%rax
andq    %rdi,%rax
movq    $0x0000800000000000,%rcx
cmpq    %rcx,%rax
sete    %al
movzbl    %al,%eax
ret

TargetLowering::SimplifySetCC would transform
(X & -256) == 256 -> (X >> 8) == 1
if the immediate fails the isLegalICmpImmediate() test. For x86,
that's immediates which are not a signed 32-bit immediate.

Based on a patch by Eli Friedman.

PR10328
rdar://9758774

llvm-svn: 160346

f579beca

Jul 16, 2012

For something like · 75315b87

Evan Cheng authored Jul 16, 2012

uint32_t hi(uint64_t res)
{
        uint_32t hi = res >> 32;
        return !hi;
}

llvm IR looks like this:
define i32 @hi(i64 %res) nounwind uwtable ssp {
entry:
  %lnot = icmp ult i64 %res, 4294967296
  %lnot.ext = zext i1 %lnot to i32
  ret i32 %lnot.ext
}

The optimizer has optimize away the right shift and truncate but the resulting
constant is too large to fit in the 32-bit immediate field. The resulting x86
code is worse as a result:
        movabsq $4294967296, %rax       ## imm = 0x100000000
        cmpq    %rax, %rdi
        sbbl    %eax, %eax
        andl    $1, %eax

This patch teaches the x86 lowering code to handle ult against a large immediate
with trailing zeros. It will issue a right shift and a truncate followed by
a comparison against a shifted immediate.
        shrq    $32, %rdi
        testl   %edi, %edi
        sete    %al
        movzbl  %al, %eax

It also handles a ugt comparison against a large immediate with trailing bits
set. i.e. X >  0x0ffffffff -> (X >> 32) >= 1

rdar://11866926

llvm-svn: 160312

75315b87

Jul 15, 2012

Teach getTargetVShiftNode about TargetConstant nodes. · eec74c72
Nadav Rotem authored Jul 15, 2012
```
llvm-svn: 160234
```
eec74c72

AVX: Fix a bug in getTargetVShiftNode. The shift amount has to be a 128bit... · 9466e81d

Nadav Rotem authored Jul 14, 2012

AVX: Fix a bug in getTargetVShiftNode. The shift amount has to be a 128bit vector with the same element type as the input vector.
This is needed because of the patterns we have for the VP[SLL/SRA/SRL][W/D/Q] instructions.

llvm-svn: 160222

9466e81d

Jul 12, 2012

Give the rdrand instructions a SideEffect flag and a chain so MachineCSE and... · 4d091678

Benjamin Kramer authored Jul 12, 2012

Give the rdrand instructions a SideEffect flag and a chain so MachineCSE and MachineLICM don't touch it.

I already had the necessary things in place for IR-level passes but missed the machine passes.

llvm-svn: 160137

4d091678

Add intrinsics for Ivy Bridge's rdrand instruction. · 0ab2794e

Benjamin Kramer authored Jul 12, 2012

The rdrand/cmov sequence is the same that is emitted by both
GCC and ICC.

Fixes PR13284.

llvm-svn: 160117

0ab2794e

Jul 11, 2012

· d2bdcebb

Nadav Rotem authored Jul 11, 2012

When ext-loading and trunc-storing vectors to memory, on x86 32bit systems, allow loads/stores of 64bit values from xmm registers.

llvm-svn: 160044

d2bdcebb

Jul 10, 2012

· d908ddc1

Nadav Rotem authored Jul 10, 2012

Improve the loading of load-anyext vectors by allowing the codegen to load
multiple scalars and insert them into a vector. Next, we shuffle the elements
into the correct places, as before.
Also fix a small dagcombine bug in SimplifyBinOpWithSameOpcodeHands, when the
migration of bitcasts happened too late in the SelectionDAG process.

llvm-svn: 159991

d908ddc1

Jul 05, 2012

Make X86 call and return instructions non-variadic. · d14101e0

Jakob Stoklund Olesen authored Jul 04, 2012

Function argument and return value registers aren't part of the
encoding, so they should be implicit operands.

llvm-svn: 159728

d14101e0

Jul 04, 2012

Ensure CopyToReg nodes are always glued to the call instruction. · 2dee8124

Jakob Stoklund Olesen authored Jul 04, 2012

The CopyToReg nodes that set up the argument registers before a call
must be glued to the call instruction. Otherwise, the scheduler may emit
the physreg copies long before the call, causing long live ranges for
the fixed registers.

Besides disabling good register allocation, that can also expose
problems when EmitInstrWithCustomInserter() splits a basic block during
the live range of a physreg.

llvm-svn: 159721

2dee8124

Jul 01, 2012
- Optimization of shuffle node that can fit to the register form of VBROADCAST instruction on AVX2. · 9af899fa
  Elena Demikhovsky authored Jul 01, 2012
```
llvm-svn: 159504
```
  9af899fa
Jun 29, 2012

In the initial exec mode we always do a load to find the address of a variable. · efdfb1e6

Rafael Espindola authored Jun 29, 2012

Before this patch in pic 32 bit code we would add the global base register
and not load from that address. This is a really old bug, but before the
introduction of the tls attributes we would never select initial exec for
pic code.

llvm-svn: 159409

efdfb1e6

Jun 26, 2012

Removed unused variable · 863d2d32
Elena Demikhovsky authored Jun 26, 2012
```
llvm-svn: 159197
```
863d2d32
Rename to match other X86_64* names. · 8ed44466
Bill Wendling authored Jun 26, 2012
```
llvm-svn: 159196
```
8ed44466

Shuffle optimization for AVX/AVX2. · 26088d2e

Elena Demikhovsky authored Jun 26, 2012

The current patch optimizes frequently used shuffle patterns and gives these instruction sequence reduction.
Before:
      vshufps $-35, %xmm1, %xmm0, %xmm2 ## xmm2 = xmm0[1,3],xmm1[1,3]
       vpermilps       $-40, %xmm2, %xmm2 ## xmm2 = xmm2[0,2,1,3]
       vextractf128    $1, %ymm1, %xmm1
       vextractf128    $1, %ymm0, %xmm0
       vshufps $-35, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[1,3],xmm1[1,3]
       vpermilps       $-40, %xmm0, %xmm0 ## xmm0 = xmm0[0,2,1,3]
       vinsertf128     $1, %xmm0, %ymm2, %ymm0
After:
      vshufps $13, %ymm0, %ymm1, %ymm1 ## ymm1 = ymm1[1,3],ymm0[0,0],ymm1[5,7],ymm0[4,4]
      vshufps $13, %ymm0, %ymm0, %ymm0 ## ymm0 = ymm0[1,3,0,0,5,7,4,4]
      vunpcklps       %ymm1, %ymm0, %ymm0 ## ymm0 = ymm0[0],ymm1[0],ymm0[1],ymm1[1],ymm0[4],ymm1[4],ymm0[5],ymm1[5]

llvm-svn: 159188

26088d2e