Commits · ea6397f67b5d18c6db2db9847dc5dad0d9fe553c · Roger Ferrer / llvm-epi-0.8

Jul 19, 2012
- Remove tabs. · ea6397f6
  Bill Wendling authored Jul 19, 2012
```
llvm-svn: 160477
```
  ea6397f6
Jul 18, 2012

X86: remove redundant cmp against zero. · d0a4ee84

Manman Ren authored Jul 18, 2012

Updated OptimizeCompare in peephole to remove redundant cmp against zero.
We only remove Compare if CF and OF are not used.

rdar://11855129

llvm-svn: 160454

d0a4ee84

This patch fixes 8 out of 20 unexpected failures in "make check" · f0a48ec8

Preston Gurd authored Jul 18, 2012

when run on an Intel Atom processor. The failures have arisen due
to changes elsewhere in the trunk over the past 8 weeks or so.

These failures were not detected by the Atom buildbot because the
CPU on the Atom buildbot was not being detected as an Atom CPU.
The fix for this problem is in Host.cpp and X86Subtarget.cpp, but
shall remain commented out until the current set of Atom test failures
are fixed.

Patch by Andy Zhang and Tyler Nowicki!

llvm-svn: 160451

f0a48ec8

The vbroadcast family of instructions has 'fallback patterns' in case where the · 4c12245b

Nadav Rotem authored Jul 18, 2012

load source operand is used by multiple nodes. The v2i64 broadcast was emulated
by shuffling the two lower i32 elements to the upper two.
We had a bug in the immediate used for the broadcast.
Replacing 0 to 0x44.
0x44 means [01|00|01|00] which corresponds to the correct lane.

Patch by Michael Kuperstein.

llvm-svn: 160430

4c12245b

Remove tab characters. · 6bf3ed45
Craig Topper authored Jul 18, 2012
```
llvm-svn: 160425
```
6bf3ed45
Fix typo in error message and remove some tab characters. · 85324232
Craig Topper authored Jul 18, 2012
```
llvm-svn: 160423
```
85324232

Make x86 asm parser to check for xmm vs ymm for index register in gather... · 01deb5f2

Craig Topper authored Jul 18, 2012

Make x86 asm parser to check for xmm vs ymm for index register in gather instructions. Also fix Intel syntax for gather instructions to use 'DWORD PTR' or 'QWORD PTR' to match gas.

llvm-svn: 160420

01deb5f2

Jul 17, 2012

Back out r160101 and instead implement a dag combine to recover from instcombine transformation. · e6a3b03e
Evan Cheng authored Jul 17, 2012
```
llvm-svn: 160387
```
e6a3b03e
Implement r160312 as target indepedenet dag combine. · 780f9b5f
Evan Cheng authored Jul 17, 2012
```
llvm-svn: 160354
```
780f9b5f

This is another case where instcombine demanded bits optimization created · f579beca

Evan Cheng authored Jul 17, 2012

large immediates. Add dag combine logic to recover in case the large
immediates doesn't fit in cmp immediate operand field.

int foo(unsigned long l) {
  return (l>> 47) == 1;
}

we produce

  %shr.mask = and i64 %l, -140737488355328
  %cmp = icmp eq i64 %shr.mask, 140737488355328
  %conv = zext i1 %cmp to i32
  ret i32 %conv

which codegens to

movq    $0xffff800000000000,%rax
andq    %rdi,%rax
movq    $0x0000800000000000,%rcx
cmpq    %rcx,%rax
sete    %al
movzbl    %al,%eax
ret

TargetLowering::SimplifySetCC would transform
(X & -256) == 256 -> (X >> 8) == 1
if the immediate fails the isLegalICmpImmediate() test. For x86,
that's immediates which are not a signed 32-bit immediate.

Based on a patch by Eli Friedman.

PR10328
rdar://9758774

llvm-svn: 160346

f579beca

Jul 16, 2012

For something like · 75315b87

Evan Cheng authored Jul 16, 2012

uint32_t hi(uint64_t res)
{
        uint_32t hi = res >> 32;
        return !hi;
}

llvm IR looks like this:
define i32 @hi(i64 %res) nounwind uwtable ssp {
entry:
  %lnot = icmp ult i64 %res, 4294967296
  %lnot.ext = zext i1 %lnot to i32
  ret i32 %lnot.ext
}

The optimizer has optimize away the right shift and truncate but the resulting
constant is too large to fit in the 32-bit immediate field. The resulting x86
code is worse as a result:
        movabsq $4294967296, %rax       ## imm = 0x100000000
        cmpq    %rax, %rdi
        sbbl    %eax, %eax
        andl    $1, %eax

This patch teaches the x86 lowering code to handle ult against a large immediate
with trailing zeros. It will issue a right shift and a truncate followed by
a comparison against a shifted immediate.
        shrq    $32, %rdi
        testl   %edi, %edi
        sete    %al
        movzbl  %al, %eax

It also handles a ugt comparison against a large immediate with trailing bits
set. i.e. X >  0x0ffffffff -> (X >> 32) >= 1

rdar://11866926

llvm-svn: 160312

75315b87

With r160248 in place this code is no longer needed. · 10e8207c
Chad Rosier authored Jul 16, 2012
```
llvm-svn: 160293
```
10e8207c

Fix a bug in the 3-address conversion of LEA when one of the operands is an · 4968e45b

Nadav Rotem authored Jul 16, 2012

undef virtual register. The problem is that ProcessImplicitDefs removes the
definition of the register and marks all uses as undef. If we lose the undef
marker then we get a register which has no def, is not marked as undef. The
live interval analysis does not collect information for these virtual
registers and we crash in later passes.

Together with Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 160260

4968e45b

This CL changes the function prologue and epilogue emitted on X86 when stack needs realignment. · dcc1291d

Alexey Samsonov authored Jul 16, 2012

It is intended to fix PR11468.

Old prologue and epilogue looked like this:
push %rbp
mov %rsp, %rbp
and $alignment, %rsp
push %r14
push %r15
...
pop %r15
pop %r14
mov %rbp, %rsp
pop %rbp

The problem was to reference the locations of callee-saved registers in exception handling:
locations of callee-saved had to be re-calculated regarding the stack alignment operation. It would
take some effort to implement this in LLVM, as currently MachineLocation can only have the form
"Register + Offset". Funciton prologue and epilogue are now changed to:

push %rbp
mov %rsp, %rbp
push %14
push %15
and $alignment, %rsp
...
lea -$size_of_saved_registers(%rbp), %rsp
pop %r15
pop %r14
pop %rbp

Reviewed by Chad Rosier.

llvm-svn: 160248

dcc1291d

Jul 15, 2012

Teach getTargetVShiftNode about TargetConstant nodes. · eec74c72
Nadav Rotem authored Jul 15, 2012
```
llvm-svn: 160234
```
eec74c72

Rename VBROADCASTSDrm into VBROADCASTSDYrm to match the naming convention. · ee3552f8

Nadav Rotem authored Jul 15, 2012

Allow the folding of vbroadcastRR to vbroadcastRM, where the memory operand is a spill slot.

PR12782.

Together with Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 160230

ee3552f8

AVX: Fix a bug in getTargetVShiftNode. The shift amount has to be a 128bit... · 9466e81d

Nadav Rotem authored Jul 14, 2012

AVX: Fix a bug in getTargetVShiftNode. The shift amount has to be a 128bit vector with the same element type as the input vector.
This is needed because of the patterns we have for the VP[SLL/SRA/SRL][W/D/Q] instructions.

llvm-svn: 160222

9466e81d

Jul 13, 2012
- Make helper functions static. · abbfe693
  Benjamin Kramer authored Jul 13, 2012
```
llvm-svn: 160173
```
  abbfe693
- Mark VINSERTI128rm as MayLoad=1. Fixes PR13348. · b3bac490
  Craig Topper authored Jul 13, 2012
```
llvm-svn: 160162
```
  b3bac490
Jul 12, 2012
- Give the rdrand instructions a SideEffect flag and a chain so MachineCSE and... · 4d091678
  Benjamin Kramer authored Jul 12, 2012
```
Give the rdrand instructions a SideEffect flag and a chain so MachineCSE and MachineLICM don't touch it.

I already had the necessary things in place for IR-level passes but missed the machine passes.

llvm-svn: 160137
```
  4d091678
- Add intrinsics for Ivy Bridge's rdrand instruction. · 0ab2794e
  Benjamin Kramer authored Jul 12, 2012
```
The rdrand/cmov sequence is the same that is emitted by both
GCC and ICC.

Fixes PR13284.

llvm-svn: 160117
```
  0ab2794e
- Update GATHER instructions to support 2 read-write operands. Patch from myself and Manman Ren. · f7755df7
  Craig Topper authored Jul 12, 2012
```
llvm-svn: 160110
```
  f7755df7
Jul 11, 2012
- [x86 fast-isel] Per discussion with Eric, add all cases to switch with verbose · 8446ede0
  Chad Rosier authored Jul 11, 2012
```
comments.

llvm-svn: 160069
```
  8446ede0
- X86: Update to peephole optimization to move Movr0 before (Sub, Cmp) pair. · 1553ce0e
  Manman Ren authored Jul 11, 2012
```
When Movr0 is between sub and cmp, we move Movr0 before sub if it enables
removal of Cmp.

llvm-svn: 160066
```
  1553ce0e
- [x86 fast-isel] Rather then call llvm_unreachable() have fast-isel fall back · 43218c59
  Chad Rosier authored Jul 11, 2012
```
to Selection DAG isel.  Patch by Andrew Kaylor <andrew.kaylor@intel.com>.

llvm-svn: 160055
```
  43218c59
- · d2bdcebb
  Nadav Rotem authored Jul 11, 2012
```
When ext-loading and trunc-storing vectors to memory, on x86 32bit systems, allow loads/stores of 64bit values from xmm registers.

llvm-svn: 160044
```
  d2bdcebb
Jul 10, 2012

Move [get|set]BasePtrStackAdjustment() from MachineFrameInfo to · 97c22142

Chad Rosier authored Jul 10, 2012

X86MachineFunctionInfo as this is currently only used by X86. If this ever
becomes an issue on another arch (e.g., ARM) then we can hoist it back out.

llvm-svn: 160009

97c22142

Add support for dynamic stack realignment in the presence of dynamic allocas on · bdb08ac5

Chad Rosier authored Jul 10, 2012

X86.  Basically, this is a reapplication of r158087 with a few fixes.

Specifically, (1) the stack pointer is restored from the base pointer before
popping callee-saved registers and (2) in obscure cases (see comments in patch)
we must cache the value of the original stack adjustment in the prologue and
apply it in the epilogue.

rdar://11496434

llvm-svn: 160002

bdb08ac5

· d908ddc1

Nadav Rotem authored Jul 10, 2012

Improve the loading of load-anyext vectors by allowing the codegen to load
multiple scalars and insert them into a vector. Next, we shuffle the elements
into the correct places, as before.
Also fix a small dagcombine bug in SimplifyBinOpWithSameOpcodeHands, when the
migration of bitcasts happened too late in the SelectionDAG process.

llvm-svn: 159991

d908ddc1

Reverse assembler/disassembler operand order for gather instructions. · be41e2da
Craig Topper authored Jul 10, 2012
```
llvm-svn: 159983
```
be41e2da

Jul 09, 2012

X86: implement functions to analyze & synthesize CMOV|SET|Jcc · 5f6fa428

Manman Ren authored Jul 09, 2012

getCondFromSETOpc, getCondFromCMovOpc, getSETFromCond, getCMovFromCond

No functional change intended.
If we want to update the condition code of CMOV|SET|Jcc, we first analyze the
opcode to get the condition code, then update the condition code, finally
synthesize the new opcode form the new condition code.

llvm-svn: 159955

5f6fa428

Jul 07, 2012

I'm introducing a new machine model to simultaneously allow simple · 87255e34

Andrew Trick authored Jul 07, 2012

subtarget CPU descriptions and support new features of
MachineScheduler.

MachineModel has three categories of data:
1) Basic properties for coarse grained instruction cost model.
2) Scheduler Read/Write resources for simple per-opcode and operand cost model (TBD).
3) Instruction itineraties for detailed per-cycle reservation tables.

These will all live side-by-side. Any subtarget can use any
combination of them. Instruction itineraries will not change in the
near term. In the long run, I expect them to only be relevant for
in-order VLIW machines that have complex contraints and require a
precise scheduling/bundling model. Once itineraries are only actively
used by VLIW-ish targets, they could be replaced by something more
appropriate for those targets.

This tablegen backend rewrite sets things up for introducing
MachineModel type #2: per opcode/operand cost model.

llvm-svn: 159891

87255e34

X86: Fix optimizeCompare to correctly check safe condition. · bb360740

Manman Ren authored Jul 07, 2012

It is safe if EFLAGS is killed or re-defined.
When we are done with the basic block, check whether EFLAGS is live-out.
Do not optimize away cmp if EFLAGS is live-out.

llvm-svn: 159888

bb360740

Jul 06, 2012

X86: peephole optimization to remove cmp instruction · c9656737

Manman Ren authored Jul 06, 2012

For each Cmp, we check whether there is an earlier Sub which make Cmp
redundant. We handle the case where SUB operates on the same source operands as
Cmp, including the case where the two source operands are swapped.

llvm-svn: 159838

c9656737

Jul 05, 2012

Make X86 call and return instructions non-variadic. · d14101e0

Jakob Stoklund Olesen authored Jul 04, 2012

Function argument and return value registers aren't part of the
encoding, so they should be implicit operands.

llvm-svn: 159728

d14101e0

Jul 04, 2012

Ensure CopyToReg nodes are always glued to the call instruction. · 2dee8124

Jakob Stoklund Olesen authored Jul 04, 2012

The CopyToReg nodes that set up the argument registers before a call
must be glued to the call instruction. Otherwise, the scheduler may emit
the physreg copies long before the call, causing long live ranges for
the fixed registers.

Besides disabling good register allocation, that can also expose
problems when EmitInstrWithCustomInserter() splits a basic block during
the live range of a physreg.

llvm-svn: 159721

2dee8124

Add early if-conversion support to X86. · 49e4d4b3

Jakob Stoklund Olesen authored Jul 04, 2012

Implement the TII hooks needed by EarlyIfConversion to create cmov
instructions and estimate their latency.

Early if-conversion is still not enabled by default.

llvm-svn: 159695

49e4d4b3

Jul 03, 2012
- Remove extra space. · 85c938f4
  Craig Topper authored Jul 03, 2012
```
llvm-svn: 159647
```
  85c938f4
- Change i128mem/i256mem to f128mem/f256mem on some floating point vector instructions. · f067f9aa
  Craig Topper authored Jul 03, 2012
```
llvm-svn: 159646
```
  f067f9aa
- Add aliases for pblendvb, blendvpd, and blendvps instructions with the... · 676dcd8c
  Craig Topper authored Jul 03, 2012
```
Add aliases for pblendvb, blendvpd, and blendvps instructions with the implicit xmm0 operand specified. Fixes PR13252.

llvm-svn: 159644
```
  676dcd8c