Commits · 761447cd98eb6547f815f08fb1593733c26de2a8 · Roger Ferrer / llvm-epi-0.8

Jul 18, 2012

Mips specific inline asm operand modifier 'M': · a62ba828

Jack Carter authored Jul 18, 2012

Print the high order register of a double word register operand.

In 32 bit mode, a 64 bit double word integer will be represented
by 2 32 bit registers. This modifier causes the high order register
to be used in the asm expression. It is useful if you are using 
doubles in assembler and continue to control register to variable
relationships.

This patch also fixes a related bug in a previous patch:

    case 'D': // Second part of a double word register operand
    case 'L': // Low order register of a double word register operand
    case 'M': // High order register of a double word register operand

I got 'D' and 'M' confused. The second part of a double word operand
will only match 'M' for one of the endianesses. I had 'L' and 'D'
be the opposite twins when 'L' and 'M' are.

llvm-svn: 160429

a62ba828

More replacing of target-dependent intrinsics with target-indepdent · b84f7bea

Joel Jones authored Jul 18, 2012

intrinsics.  The second instruction(s) to be handled are the vector versions 
of count set bits (ctpop).

The changes here are to clang so that it generates a target independent 
vector ctpop when it sees an ARM dependent vector bits set count.  The changes 
in llvm are to match the target independent vector ctpop and in 
VMCore/AutoUpgrade.cpp to update any existing bc files containing ARM 
dependent vector pop counts with target-independent ctpops.  There are also 
changes to an existing test case in llvm for ARM vector count instructions and 
to a test for the bitcode upgrade.

<rdar://problem/11892519>

There is deliberately no test for the change to clang, as so far as I know, no
consensus has been reached regarding how to test neon instructions in clang;
q.v. <rdar://problem/8762292>

llvm-svn: 160410

b84f7bea

Jul 17, 2012

Add test case for r160387 · f73d7553
Evan Cheng authored Jul 17, 2012
```
llvm-svn: 160389
```
f73d7553

Fix a crash in the legalization of large vectors. · 277a40bc

Nadav Rotem authored Jul 17, 2012

When truncating a result of a vector that is split we need
to use the result of the split vector, and not re-split the dead node.

llvm-svn: 160357

277a40bc

Implement r160312 as target indepedenet dag combine. · 780f9b5f
Evan Cheng authored Jul 17, 2012
```
llvm-svn: 160354
```
780f9b5f

This is another case where instcombine demanded bits optimization created · f579beca

Evan Cheng authored Jul 17, 2012

large immediates. Add dag combine logic to recover in case the large
immediates doesn't fit in cmp immediate operand field.

int foo(unsigned long l) {
  return (l>> 47) == 1;
}

we produce

  %shr.mask = and i64 %l, -140737488355328
  %cmp = icmp eq i64 %shr.mask, 140737488355328
  %conv = zext i1 %cmp to i32
  ret i32 %conv

which codegens to

movq    $0xffff800000000000,%rax
andq    %rdi,%rax
movq    $0x0000800000000000,%rcx
cmpq    %rcx,%rax
sete    %al
movzbl    %al,%eax
ret

TargetLowering::SimplifySetCC would transform
(X & -256) == 256 -> (X >> 8) == 1
if the immediate fails the isLegalICmpImmediate() test. For x86,
that's immediates which are not a signed 32-bit immediate.

Based on a patch by Eli Friedman.

PR10328
rdar://9758774

llvm-svn: 160346

f579beca

Fix function select_cc_f32 in test/CodeGen/Mips/selectcc.ll. · 04674446
Akira Hatanaka authored Jul 16, 2012
```
llvm-svn: 160329
```
04674446

Jul 16, 2012

For something like · 75315b87

Evan Cheng authored Jul 16, 2012

uint32_t hi(uint64_t res)
{
        uint_32t hi = res >> 32;
        return !hi;
}

llvm IR looks like this:
define i32 @hi(i64 %res) nounwind uwtable ssp {
entry:
  %lnot = icmp ult i64 %res, 4294967296
  %lnot.ext = zext i1 %lnot to i32
  ret i32 %lnot.ext
}

The optimizer has optimize away the right shift and truncate but the resulting
constant is too large to fit in the 32-bit immediate field. The resulting x86
code is worse as a result:
        movabsq $4294967296, %rax       ## imm = 0x100000000
        cmpq    %rax, %rdi
        sbbl    %eax, %eax
        andl    $1, %eax

This patch teaches the x86 lowering code to handle ult against a large immediate
with trailing zeros. It will issue a right shift and a truncate followed by
a comparison against a shifted immediate.
        shrq    $32, %rdi
        testl   %edi, %edi
        sete    %al
        movzbl  %al, %eax

It also handles a ugt comparison against a large immediate with trailing bits
set. i.e. X >  0x0ffffffff -> (X >> 32) >= 1

rdar://11866926

llvm-svn: 160312

75315b87

· 839a06e9

Nadav Rotem authored Jul 16, 2012

Make ComputeDemandedBits return a deterministic result when computing an AssertZext value.
In the added testcase the constant 55 was behind an AssertZext of type i1, and ComputeDemandedBits
reported that some of the bits were both known to be one and known to be zero.

Together with Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 160305

839a06e9

Revert "test/CodeGen/R600: Add some basic tests v6" · fc3db614
Tom Stellard authored Jul 16, 2012
```
This reverts commit 11d3457afcda7848448dd7f11b2ede6552ffb9ea.

llvm-svn: 160300
```
fc3db614

Fix tests that failed on i686-win32 after r160248: · 893d3d33

Alexey Samsonov authored Jul 16, 2012

1. FileCheck-ize epilogue.ll and allow another asm instruction to restore %rsp.
2. Remove check in widen_arith-3.ll that was hitting instruction in epilogue instead of
vector add.

llvm-svn: 160274

893d3d33

test/CodeGen/R600: Add some basic tests v6 · 6693fbe3
Tom Stellard authored Jul 16, 2012
```
llvm-svn: 160273
```
6693fbe3

Fix a bug in the 3-address conversion of LEA when one of the operands is an · 4968e45b

Nadav Rotem authored Jul 16, 2012

undef virtual register. The problem is that ProcessImplicitDefs removes the
definition of the register and marks all uses as undef. If we lose the undef
marker then we get a register which has no def, is not marked as undef. The
live interval analysis does not collect information for these virtual
registers and we crash in later passes.

Together with Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 160260

4968e45b

This CL changes the function prologue and epilogue emitted on X86 when stack needs realignment. · dcc1291d

Alexey Samsonov authored Jul 16, 2012

It is intended to fix PR11468.

Old prologue and epilogue looked like this:
push %rbp
mov %rsp, %rbp
and $alignment, %rsp
push %r14
push %r15
...
pop %r15
pop %r14
mov %rbp, %rsp
pop %rbp

The problem was to reference the locations of callee-saved registers in exception handling:
locations of callee-saved had to be re-calculated regarding the stack alignment operation. It would
take some effort to implement this in LLVM, as currently MachineLocation can only have the form
"Register + Offset". Funciton prologue and epilogue are now changed to:

push %rbp
mov %rsp, %rbp
push %14
push %15
and $alignment, %rsp
...
lea -$size_of_saved_registers(%rbp), %rsp
pop %r15
pop %r14
pop %rbp

Reviewed by Chad Rosier.

llvm-svn: 160248

dcc1291d

Jul 15, 2012

Fix a bug in the scalarization of BUILD_VECTOR. BUILD_VECTOR elements may be... · 3050e071

Nadav Rotem authored Jul 15, 2012

Fix a bug in the scalarization of BUILD_VECTOR. BUILD_VECTOR elements may be wider than the output element type. Make sure to trunc them if needed.

Together with Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 160235

3050e071

Teach getTargetVShiftNode about TargetConstant nodes. · eec74c72
Nadav Rotem authored Jul 15, 2012
```
llvm-svn: 160234
```
eec74c72
llvm/test/CodeGen/X86/2012-07-15-broadcastfold.ll: Rewrite expressions to fit various targets. · 032dc0a0
NAKAMURA Takumi authored Jul 15, 2012
```
  - Make sure existence of "barrier".
  - Confirm reload corresponding to spill.

llvm-svn: 160232
```
032dc0a0

Rename VBROADCASTSDrm into VBROADCASTSDYrm to match the naming convention. · ee3552f8

Nadav Rotem authored Jul 15, 2012

Allow the folding of vbroadcastRR to vbroadcastRM, where the memory operand is a spill slot.

PR12782.

Together with Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 160230

ee3552f8

AVX: Fix a bug in getTargetVShiftNode. The shift amount has to be a 128bit... · 9466e81d

Nadav Rotem authored Jul 14, 2012

AVX: Fix a bug in getTargetVShiftNode. The shift amount has to be a 128bit vector with the same element type as the input vector.
This is needed because of the patterns we have for the VP[SLL/SRA/SRL][W/D/Q] instructions.

llvm-svn: 160222

9466e81d

Jul 14, 2012

Add a dagcombine optimization to convert concat_vectors of undefs into a single undef. · 01892100
Nadav Rotem authored Jul 14, 2012
```
The unoptimized concat_vectors isd prevented the canonicalization of the vector_shuffle node.

llvm-svn: 160221
```
01892100

This is one of the first steps at moving to replace target-dependent · 43cb8783

Joel Jones authored Jul 13, 2012

intrinsics with target-indepdent intrinsics.  The first instruction(s) to be 
handled are the vector versions of count leading zeros (ctlz).

The changes here are to clang so that it generates a target independent 
vector ctlz when it sees an ARM dependent vector ctlz.  The changes in llvm 
are to match the target independent vector ctlz and in VMCore/AutoUpgrade.cpp 
to update any existing bc files containing ARM dependent vector ctlzs with 
target-independent ctlzs.  There are also changes to an existing test case in 
llvm for ARM vector count instructions and a new test for the bitcode upgrade.

<rdar://problem/11831778>

There is deliberately no test for the change to clang, as so far as I know, no
consensus has been reached regarding how to test neon instructions in clang;
q.v. <rdar://problem/8762292>

llvm-svn: 160200

43cb8783

Jul 13, 2012
- Restrict this to x86, hopefully fixing ARM buildbots. · a9c373e4
  Duncan Sands authored Jul 13, 2012
```
llvm-svn: 160163
```
  a9c373e4
Jul 12, 2012

Give the rdrand instructions a SideEffect flag and a chain so MachineCSE and... · 4d091678

Benjamin Kramer authored Jul 12, 2012

Give the rdrand instructions a SideEffect flag and a chain so MachineCSE and MachineLICM don't touch it.

I already had the necessary things in place for IR-level passes but missed the machine passes.

llvm-svn: 160137

4d091678

The LIT tests below do not specify the exact cpu model and fail on AVX2... · fdce33a4

Nadav Rotem authored Jul 12, 2012

The LIT tests below do not specify the exact cpu model and fail on AVX2 machines, because we select different instructions such as vbroadcast, new shuffles, etc.
Patch by Michael Liao.

llvm-svn: 160129

fdce33a4

llvm/test/CodeGen/X86/rdrand.ll: Relax expression corresponding to Win64 CC. · f415fe70
NAKAMURA Takumi authored Jul 12, 2012
```
llvm-svn: 160124
```
f415fe70
Use %s instead of the explicit name, the latter doesn't work in out-of-tree builds. · cbac2f3b
Benjamin Kramer authored Jul 12, 2012
```
llvm-svn: 160120
```
cbac2f3b

Add intrinsics for Ivy Bridge's rdrand instruction. · 0ab2794e

Benjamin Kramer authored Jul 12, 2012

The rdrand/cmov sequence is the same that is emitted by both
GCC and ICC.

Fixes PR13284.

llvm-svn: 160117

0ab2794e

The result type of EXTRACT_VECTOR_ELT doesn't have to match the element type of · 671cc257

Duncan Sands authored Jul 12, 2012

the input vector, it can be bigger (this is helpful for powerpc where <2 x i16>
is a legal vector type but i16 isn't a legal type, IIRC).  However this wasn't
being taken into account by ExpandRes_EXTRACT_VECTOR_ELT, causing PR13220.
Lightly tweaked version of a patch by Michael Liao.

llvm-svn: 160116

671cc257

Update GATHER instructions to support 2 read-write operands. Patch from myself and Manman Ren. · f7755df7
Craig Topper authored Jul 12, 2012
```
llvm-svn: 160110
```
f7755df7

ARM: Fix optimizeCompare to correctly check safe condition. · 34cb93e1

Manman Ren authored Jul 11, 2012

It is safe if CPSR is killed or re-defined.
When we are done with the basic block, check whether CPSR is live-out.
Do not optimize away cmp if CPSR is live-out.

llvm-svn: 160090

34cb93e1

Jul 11, 2012

Test case for r160036. · 20dced4d
Akira Hatanaka authored Jul 11, 2012
```
llvm-svn: 160067
```
20dced4d
X86: Update to peephole optimization to move Movr0 before (Sub, Cmp) pair. · 1553ce0e
Manman Ren authored Jul 11, 2012
```
When Movr0 is between sub and cmp, we move Movr0 before sub if it enables
removal of Cmp.

llvm-svn: 160066
```
1553ce0e
Implement MipsTargetLowering::LowerSELECT_CC to custom lower SELECT_CC. · 24cf4e36
Akira Hatanaka authored Jul 11, 2012
```
llvm-svn: 160064
```
24cf4e36
PR13326: Fix a subtle edge case in the udiv -> magic multiply generator. · 3aab6a86
Benjamin Kramer authored Jul 11, 2012
```
This caused 6 of 65k possible 8 bit udivs to be wrong.

llvm-svn: 160058
```
3aab6a86

· d2bdcebb

Nadav Rotem authored Jul 11, 2012

When ext-loading and trunc-storing vectors to memory, on x86 32bit systems, allow loads/stores of 64bit values from xmm registers.

llvm-svn: 160044

d2bdcebb

Lower RETURNADDR node in Mips backend. · 878ad8b2
Akira Hatanaka authored Jul 11, 2012
```
Patch by Sasa Stankovic.

llvm-svn: 160031
```
878ad8b2

Mips specific inline asm operand modifier 'L'. · e8cb2fc6

Jack Carter authored Jul 10, 2012

   
   Low order register of a double word register operand. Operands 
   are defined by the name of the variable they are marked with in
   the inline assembler code. This is a way to specify that the 
   operand just refers to the low order register for that variable.
   
   It is the opposite of modifier 'D' which specifies the high order
   register.
   
   Example:
   
 main()
{

    long long ll_input = 0x1111222233334444LL;
    long long ll_val = 3;
    int i_result = 0;

    __asm__ __volatile__( 
		   "or	%0, %L1, %2"
	     : "=r" (i_result) 
	     : "r" (ll_input), "r" (ll_val)); 
}

   Which results in:
   
   	lui	$2, %hi(_gp_disp)
	addiu	$2, $2, %lo(_gp_disp)
	addiu	$sp, $sp, -8
	addu	$2, $2, $25
	sw	$2, 0($sp)
	lui	$2, 13107
	ori	$3, $2, 17476     <-- Low 32 bits of ll_input
	lui	$2, 4369
	ori	$4, $2, 8738      <-- High 32 bits of ll_input
	addiu	$5, $zero, 3  <-- Low 32 bits of ll_val
	addiu	$2, $zero, 0  <-- High 32 bits of ll_val
	#APP
	or	$3, $4, $5        <-- or i_result, high 32 ll_input, low 32 of ll_val
	#NO_APP
	addiu	$sp, $sp, 8
	jr	$ra

If not direction is done for the long long for 32 bit variables results
in using the low 32 bits as ll_val shows.

There is an existing bug if 'L' or 'D' is used for the destination register
for 32 bit long longs in that the target value will be updated incorrectly
for the non-specified part unless explicitly set within the inline asm code.

llvm-svn: 160028

e8cb2fc6

Jul 10, 2012

Add newline. · 3ee9a4c2
Chad Rosier authored Jul 10, 2012
```
llvm-svn: 160006
```
3ee9a4c2
Add test case accidentally omitted from r160002. · 579b1fee
Chad Rosier authored Jul 10, 2012
```
llvm-svn: 160004
```
579b1fee

Add support for dynamic stack realignment in the presence of dynamic allocas on · bdb08ac5

Chad Rosier authored Jul 10, 2012

X86.  Basically, this is a reapplication of r158087 with a few fixes.

Specifically, (1) the stack pointer is restored from the base pointer before
popping callee-saved registers and (2) in obscure cases (see comments in patch)
we must cache the value of the original stack adjustment in the prologue and
apply it in the epilogue.

rdar://11496434

llvm-svn: 160002

bdb08ac5