Commits · fc93281c07e993f7b48d055afd5b38409ae172e8 · Roger Ferrer / llvm-epi-0.8

Jul 28, 2012
- Fold patterns for some of the SSE/AVX convert instructions into their instruction definitions. · fc93281c
  Craig Topper authored Jul 28, 2012
  
  llvm-svn: 160922
  fc93281c
- Mark some of the SSE/AVX convert instructions as mayLoad/neverHasSideEffects. · 024797b9
  Craig Topper authored Jul 28, 2012
  
  llvm-svn: 160921
  024797b9
- X86 Peephole: fold loads to the source register operand if possible. · 0fa3ab88
  Manman Ren authored Jul 28, 2012
  
  Machine CSE and other optimizations can remove instructions so folding is possible at peephole while not possible at ISel. rdar://10554090 and rdar://11873276 llvm-svn: 160919
  0fa3ab88
- Make CVTSS2SI instruction definition consistent with CVTSD2SI. · 44f9b534
  Craig Topper authored Jul 28, 2012
  
  llvm-svn: 160914
  44f9b534
- Fix up memory load types for SSE scalar convert intrinsic patterns. · 1c1aef07
  Craig Topper authored Jul 28, 2012
  
  llvm-svn: 160913
  1c1aef07
- X86 Peephole: fix PR13475 in optimizeCompare. · 32367c06
  Manman Ren authored Jul 28, 2012
  
  It is possible that an instruction can use and update EFLAGS. When checking the safety, we should check the usage of EFLAGS first before declaring it is safe to optimize due to the update. llvm-svn: 160912
  32367c06
Jul 27, 2012

Remove the X86 sub_ss and sub_sd sub-register indexes completely. · 7cd08536
Jakob Stoklund Olesen authored Jul 26, 2012
```
llvm-svn: 160833
```
7cd08536
Remove the last mentions of sub_ss and sub_sd from patterns. · 77cd55b4
Jakob Stoklund Olesen authored Jul 26, 2012
```
I'll remove these two sub-register indexes shortly.

llvm-svn: 160831
```
77cd55b4

Eliminate sub_ss, sub_sd from broadcast patterns. · b96d0b4e

Jakob Stoklund Olesen authored Jul 26, 2012

The (COPY_TO_REGCLASS GR32:$src, VR128) pattern looks odd, but
copyPhysReg does the right thing with it. (The old pattern would
eventually produce the same cross-class copy).

llvm-svn: 160830

b96d0b4e

Eliminate more sub_ss / sub_sd patterns. · 206b825f

Jakob Stoklund Olesen authored Jul 26, 2012

This gets rid of some more INSERT_SUBREG - IMPLICIT_DEF patterns,
simplifying the emitted code a bit.

llvm-svn: 160820

206b825f

Eliminate some SUBREG_TO_REG patterns with sub_ss and sub_sd. · 75d17b05

Jakob Stoklund Olesen authored Jul 26, 2012

The SUBREG_TO_REG instruction has magic semantics asserting that the
source value was defined by an instruction that cleared the high half of
the register. Those semantics are never actually exploited for xmm
registers.

llvm-svn: 160818

75d17b05

Jul 26, 2012

Eliminate a batch of uses of sub_ss and sub_sd in the X86 target. · ceee4a9d

Jakob Stoklund Olesen authored Jul 26, 2012

These idempotent sub-register indices don't do anything --- They simply
map XMM registers to themselves. They no longer affect register classes
either since the SubRegClasses field has been removed from Target.td.

This patch replaces XMM->XMM EXTRACT_SUBREG and INSERT_SUBREG patterns
with COPY_TO_REGCLASS patterns which simply become COPY instructions.

The number of IMPLICIT_DEF instructions before register allocation is
reduced, and that is the cause of the test case changes.

llvm-svn: 160816

ceee4a9d

Make l/q suffixes on AVX forms of scalar convert instructions consistent with their non-AVX forms. · c7690ac7
Craig Topper authored Jul 26, 2012
```
llvm-svn: 160775
```
c7690ac7

Jul 25, 2012
- Fix typos. Thanks to Matt Beaumont-Gay for noticing it. · 73173c55
  Rafael Espindola authored Jul 25, 2012
  
  llvm-svn: 160731
  73173c55
- When a return struct pointer is passed in registers, the called has nothing · 11c38b96
  Rafael Espindola authored Jul 25, 2012
  
  to pop. llvm-svn: 160725
  11c38b96
- Factor a long list of conditions into a predicate function. No functionality · 2caee7f4
  Rafael Espindola authored Jul 25, 2012
  
  change. llvm-svn: 160724
  2caee7f4
Jul 24, 2012

Fix a bug in the x86 disassembler's symbolic disassembly support for Jcc-Jump · 216ac319

Kevin Enderby authored Jul 24, 2012

if Condition Is Met instuctions that was not correctly determining the target
instruction.

So for a jne rel32 instruction:

% cat x.s
.byte 0x0f, 0x85, 0x09, 0x00, 0x00, 0x00
% as x.s

it was incorrectly deterining the target:

% otool -q -tv a.out 
a.out:
(__TEXT,__text) section
0000000000000000	jne	0xd

and with the fix it gets this correct as:

% otool -q -tv a.out
a.out:
(__TEXT,__text) section
0000000000000000	jne	0xf

rdar://11505997

llvm-svn: 160694

216ac319

ELF does not imply GNU/Linux. Do not assume GNU conventions just because we · 5b8c1680

David Chisnall authored Jul 24, 2012

are targeting an ELF platform.  Only fold gs-relative (and fs-relative) loads
if it is actually sensible to do so for the target platform.

This fixes PR13438.

llvm-svn: 160687

5b8c1680

Jul 23, 2012
- Fix a typo (the the => the) · 35521e23
  Sylvestre Ledru authored Jul 23, 2012
  
  llvm-svn: 160621
  35521e23
Jul 20, 2012

Don't use implicit register operands to calculate L-bit for AVX instructions.... · 0b94e46c

Craig Topper authored Jul 20, 2012

Don't use implicit register operands to calculate L-bit for AVX instructions. Needed because super reg defs and kills are added as implicit operands on 128-bit instructions. Fixes PR13349. Patch by Jose Fonseca.

llvm-svn: 160543

0b94e46c

Jul 19, 2012
- Adds the family codes for the Midview Atom processors so that the · 8e082688
  Preston Gurd authored Jul 19, 2012
  
  Atom buildbot will auto-detect Atom. llvm-svn: 160521
  8e082688
- Remove tabs. · 318f03f5
  Bill Wendling authored Jul 19, 2012
  
  llvm-svn: 160479
  318f03f5
- Remove tabs. · ea6397f6
  Bill Wendling authored Jul 19, 2012
  
  llvm-svn: 160477
  ea6397f6
Jul 18, 2012

X86: remove redundant cmp against zero. · d0a4ee84

Manman Ren authored Jul 18, 2012

Updated OptimizeCompare in peephole to remove redundant cmp against zero.
We only remove Compare if CF and OF are not used.

rdar://11855129

llvm-svn: 160454

d0a4ee84

This patch fixes 8 out of 20 unexpected failures in "make check" · f0a48ec8

Preston Gurd authored Jul 18, 2012

when run on an Intel Atom processor. The failures have arisen due
to changes elsewhere in the trunk over the past 8 weeks or so.

These failures were not detected by the Atom buildbot because the
CPU on the Atom buildbot was not being detected as an Atom CPU.
The fix for this problem is in Host.cpp and X86Subtarget.cpp, but
shall remain commented out until the current set of Atom test failures
are fixed.

Patch by Andy Zhang and Tyler Nowicki!

llvm-svn: 160451

f0a48ec8

The vbroadcast family of instructions has 'fallback patterns' in case where the · 4c12245b

Nadav Rotem authored Jul 18, 2012

load source operand is used by multiple nodes. The v2i64 broadcast was emulated
by shuffling the two lower i32 elements to the upper two.
We had a bug in the immediate used for the broadcast.
Replacing 0 to 0x44.
0x44 means [01|00|01|00] which corresponds to the correct lane.

Patch by Michael Kuperstein.

llvm-svn: 160430

4c12245b

Remove tab characters. · 6bf3ed45
Craig Topper authored Jul 18, 2012
```
llvm-svn: 160425
```
6bf3ed45
Fix typo in error message and remove some tab characters. · 85324232
Craig Topper authored Jul 18, 2012
```
llvm-svn: 160423
```
85324232

Make x86 asm parser to check for xmm vs ymm for index register in gather... · 01deb5f2

Craig Topper authored Jul 18, 2012

Make x86 asm parser to check for xmm vs ymm for index register in gather instructions. Also fix Intel syntax for gather instructions to use 'DWORD PTR' or 'QWORD PTR' to match gas.

llvm-svn: 160420

01deb5f2

Jul 17, 2012

Back out r160101 and instead implement a dag combine to recover from instcombine transformation. · e6a3b03e
Evan Cheng authored Jul 17, 2012
```
llvm-svn: 160387
```
e6a3b03e
Implement r160312 as target indepedenet dag combine. · 780f9b5f
Evan Cheng authored Jul 17, 2012
```
llvm-svn: 160354
```
780f9b5f

This is another case where instcombine demanded bits optimization created · f579beca

Evan Cheng authored Jul 17, 2012

large immediates. Add dag combine logic to recover in case the large
immediates doesn't fit in cmp immediate operand field.

int foo(unsigned long l) {
  return (l>> 47) == 1;
}

we produce

  %shr.mask = and i64 %l, -140737488355328
  %cmp = icmp eq i64 %shr.mask, 140737488355328
  %conv = zext i1 %cmp to i32
  ret i32 %conv

which codegens to

movq    $0xffff800000000000,%rax
andq    %rdi,%rax
movq    $0x0000800000000000,%rcx
cmpq    %rcx,%rax
sete    %al
movzbl    %al,%eax
ret

TargetLowering::SimplifySetCC would transform
(X & -256) == 256 -> (X >> 8) == 1
if the immediate fails the isLegalICmpImmediate() test. For x86,
that's immediates which are not a signed 32-bit immediate.

Based on a patch by Eli Friedman.

PR10328
rdar://9758774

llvm-svn: 160346

f579beca

Jul 16, 2012

For something like · 75315b87

Evan Cheng authored Jul 16, 2012

uint32_t hi(uint64_t res)
{
        uint_32t hi = res >> 32;
        return !hi;
}

llvm IR looks like this:
define i32 @hi(i64 %res) nounwind uwtable ssp {
entry:
  %lnot = icmp ult i64 %res, 4294967296
  %lnot.ext = zext i1 %lnot to i32
  ret i32 %lnot.ext
}

The optimizer has optimize away the right shift and truncate but the resulting
constant is too large to fit in the 32-bit immediate field. The resulting x86
code is worse as a result:
        movabsq $4294967296, %rax       ## imm = 0x100000000
        cmpq    %rax, %rdi
        sbbl    %eax, %eax
        andl    $1, %eax

This patch teaches the x86 lowering code to handle ult against a large immediate
with trailing zeros. It will issue a right shift and a truncate followed by
a comparison against a shifted immediate.
        shrq    $32, %rdi
        testl   %edi, %edi
        sete    %al
        movzbl  %al, %eax

It also handles a ugt comparison against a large immediate with trailing bits
set. i.e. X >  0x0ffffffff -> (X >> 32) >= 1

rdar://11866926

llvm-svn: 160312

75315b87

With r160248 in place this code is no longer needed. · 10e8207c
Chad Rosier authored Jul 16, 2012
```
llvm-svn: 160293
```
10e8207c

Fix a bug in the 3-address conversion of LEA when one of the operands is an · 4968e45b

Nadav Rotem authored Jul 16, 2012

undef virtual register. The problem is that ProcessImplicitDefs removes the
definition of the register and marks all uses as undef. If we lose the undef
marker then we get a register which has no def, is not marked as undef. The
live interval analysis does not collect information for these virtual
registers and we crash in later passes.

Together with Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 160260

4968e45b

This CL changes the function prologue and epilogue emitted on X86 when stack needs realignment. · dcc1291d

Alexey Samsonov authored Jul 16, 2012

It is intended to fix PR11468.

Old prologue and epilogue looked like this:
push %rbp
mov %rsp, %rbp
and $alignment, %rsp
push %r14
push %r15
...
pop %r15
pop %r14
mov %rbp, %rsp
pop %rbp

The problem was to reference the locations of callee-saved registers in exception handling:
locations of callee-saved had to be re-calculated regarding the stack alignment operation. It would
take some effort to implement this in LLVM, as currently MachineLocation can only have the form
"Register + Offset". Funciton prologue and epilogue are now changed to:

push %rbp
mov %rsp, %rbp
push %14
push %15
and $alignment, %rsp
...
lea -$size_of_saved_registers(%rbp), %rsp
pop %r15
pop %r14
pop %rbp

Reviewed by Chad Rosier.

llvm-svn: 160248

dcc1291d

Jul 15, 2012

Teach getTargetVShiftNode about TargetConstant nodes. · eec74c72
Nadav Rotem authored Jul 15, 2012
```
llvm-svn: 160234
```
eec74c72

Rename VBROADCASTSDrm into VBROADCASTSDYrm to match the naming convention. · ee3552f8

Nadav Rotem authored Jul 15, 2012

Allow the folding of vbroadcastRR to vbroadcastRM, where the memory operand is a spill slot.

PR12782.

Together with Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 160230

ee3552f8

AVX: Fix a bug in getTargetVShiftNode. The shift amount has to be a 128bit... · 9466e81d

Nadav Rotem authored Jul 14, 2012

AVX: Fix a bug in getTargetVShiftNode. The shift amount has to be a 128bit vector with the same element type as the input vector.
This is needed because of the patterns we have for the VP[SLL/SRA/SRL][W/D/Q] instructions.

llvm-svn: 160222

9466e81d

Jul 13, 2012
- Make helper functions static. · abbfe693
  Benjamin Kramer authored Jul 13, 2012
  
  llvm-svn: 160173
  abbfe693