- Jul 18, 2012
-
-
Jack Carter authored
Print the high order register of a double word register operand. In 32 bit mode, a 64 bit double word integer will be represented by 2 32 bit registers. This modifier causes the high order register to be used in the asm expression. It is useful if you are using doubles in assembler and continue to control register to variable relationships. This patch also fixes a related bug in a previous patch: case 'D': // Second part of a double word register operand case 'L': // Low order register of a double word register operand case 'M': // High order register of a double word register operand I got 'D' and 'M' confused. The second part of a double word operand will only match 'M' for one of the endianesses. I had 'L' and 'D' be the opposite twins when 'L' and 'M' are. llvm-svn: 160429
-
Joel Jones authored
intrinsics. The second instruction(s) to be handled are the vector versions of count set bits (ctpop). The changes here are to clang so that it generates a target independent vector ctpop when it sees an ARM dependent vector bits set count. The changes in llvm are to match the target independent vector ctpop and in VMCore/AutoUpgrade.cpp to update any existing bc files containing ARM dependent vector pop counts with target-independent ctpops. There are also changes to an existing test case in llvm for ARM vector count instructions and to a test for the bitcode upgrade. <rdar://problem/11892519> There is deliberately no test for the change to clang, as so far as I know, no consensus has been reached regarding how to test neon instructions in clang; q.v. <rdar://problem/8762292> llvm-svn: 160410
-
- Jul 17, 2012
-
-
Evan Cheng authored
llvm-svn: 160389
-
Nadav Rotem authored
When truncating a result of a vector that is split we need to use the result of the split vector, and not re-split the dead node. llvm-svn: 160357
-
Evan Cheng authored
llvm-svn: 160354
-
Evan Cheng authored
large immediates. Add dag combine logic to recover in case the large immediates doesn't fit in cmp immediate operand field. int foo(unsigned long l) { return (l>> 47) == 1; } we produce %shr.mask = and i64 %l, -140737488355328 %cmp = icmp eq i64 %shr.mask, 140737488355328 %conv = zext i1 %cmp to i32 ret i32 %conv which codegens to movq $0xffff800000000000,%rax andq %rdi,%rax movq $0x0000800000000000,%rcx cmpq %rcx,%rax sete %al movzbl %al,%eax ret TargetLowering::SimplifySetCC would transform (X & -256) == 256 -> (X >> 8) == 1 if the immediate fails the isLegalICmpImmediate() test. For x86, that's immediates which are not a signed 32-bit immediate. Based on a patch by Eli Friedman. PR10328 rdar://9758774 llvm-svn: 160346
-
Akira Hatanaka authored
llvm-svn: 160329
-
- Jul 16, 2012
-
-
Evan Cheng authored
uint32_t hi(uint64_t res) { uint_32t hi = res >> 32; return !hi; } llvm IR looks like this: define i32 @hi(i64 %res) nounwind uwtable ssp { entry: %lnot = icmp ult i64 %res, 4294967296 %lnot.ext = zext i1 %lnot to i32 ret i32 %lnot.ext } The optimizer has optimize away the right shift and truncate but the resulting constant is too large to fit in the 32-bit immediate field. The resulting x86 code is worse as a result: movabsq $4294967296, %rax ## imm = 0x100000000 cmpq %rax, %rdi sbbl %eax, %eax andl $1, %eax This patch teaches the x86 lowering code to handle ult against a large immediate with trailing zeros. It will issue a right shift and a truncate followed by a comparison against a shifted immediate. shrq $32, %rdi testl %edi, %edi sete %al movzbl %al, %eax It also handles a ugt comparison against a large immediate with trailing bits set. i.e. X > 0x0ffffffff -> (X >> 32) >= 1 rdar://11866926 llvm-svn: 160312
-
Nadav Rotem authored
Make ComputeDemandedBits return a deterministic result when computing an AssertZext value. In the added testcase the constant 55 was behind an AssertZext of type i1, and ComputeDemandedBits reported that some of the bits were both known to be one and known to be zero. Together with Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 160305
-
Tom Stellard authored
This reverts commit 11d3457afcda7848448dd7f11b2ede6552ffb9ea. llvm-svn: 160300
-
Alexey Samsonov authored
1. FileCheck-ize epilogue.ll and allow another asm instruction to restore %rsp. 2. Remove check in widen_arith-3.ll that was hitting instruction in epilogue instead of vector add. llvm-svn: 160274
-
Tom Stellard authored
llvm-svn: 160273
-
Nadav Rotem authored
undef virtual register. The problem is that ProcessImplicitDefs removes the definition of the register and marks all uses as undef. If we lose the undef marker then we get a register which has no def, is not marked as undef. The live interval analysis does not collect information for these virtual registers and we crash in later passes. Together with Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 160260
-
Alexey Samsonov authored
It is intended to fix PR11468. Old prologue and epilogue looked like this: push %rbp mov %rsp, %rbp and $alignment, %rsp push %r14 push %r15 ... pop %r15 pop %r14 mov %rbp, %rsp pop %rbp The problem was to reference the locations of callee-saved registers in exception handling: locations of callee-saved had to be re-calculated regarding the stack alignment operation. It would take some effort to implement this in LLVM, as currently MachineLocation can only have the form "Register + Offset". Funciton prologue and epilogue are now changed to: push %rbp mov %rsp, %rbp push %14 push %15 and $alignment, %rsp ... lea -$size_of_saved_registers(%rbp), %rsp pop %r15 pop %r14 pop %rbp Reviewed by Chad Rosier. llvm-svn: 160248
-
- Jul 15, 2012
-
-
Nadav Rotem authored
Fix a bug in the scalarization of BUILD_VECTOR. BUILD_VECTOR elements may be wider than the output element type. Make sure to trunc them if needed. Together with Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 160235
-
Nadav Rotem authored
llvm-svn: 160234
-
NAKAMURA Takumi authored
- Make sure existence of "barrier". - Confirm reload corresponding to spill. llvm-svn: 160232
-
Nadav Rotem authored
Allow the folding of vbroadcastRR to vbroadcastRM, where the memory operand is a spill slot. PR12782. Together with Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 160230
-
Nadav Rotem authored
AVX: Fix a bug in getTargetVShiftNode. The shift amount has to be a 128bit vector with the same element type as the input vector. This is needed because of the patterns we have for the VP[SLL/SRA/SRL][W/D/Q] instructions. llvm-svn: 160222
-
- Jul 14, 2012
-
-
Nadav Rotem authored
The unoptimized concat_vectors isd prevented the canonicalization of the vector_shuffle node. llvm-svn: 160221
-
Joel Jones authored
intrinsics with target-indepdent intrinsics. The first instruction(s) to be handled are the vector versions of count leading zeros (ctlz). The changes here are to clang so that it generates a target independent vector ctlz when it sees an ARM dependent vector ctlz. The changes in llvm are to match the target independent vector ctlz and in VMCore/AutoUpgrade.cpp to update any existing bc files containing ARM dependent vector ctlzs with target-independent ctlzs. There are also changes to an existing test case in llvm for ARM vector count instructions and a new test for the bitcode upgrade. <rdar://problem/11831778> There is deliberately no test for the change to clang, as so far as I know, no consensus has been reached regarding how to test neon instructions in clang; q.v. <rdar://problem/8762292> llvm-svn: 160200
-
- Jul 13, 2012
-
-
Duncan Sands authored
llvm-svn: 160163
-
- Jul 12, 2012
-
-
Benjamin Kramer authored
Give the rdrand instructions a SideEffect flag and a chain so MachineCSE and MachineLICM don't touch it. I already had the necessary things in place for IR-level passes but missed the machine passes. llvm-svn: 160137
-
Nadav Rotem authored
The LIT tests below do not specify the exact cpu model and fail on AVX2 machines, because we select different instructions such as vbroadcast, new shuffles, etc. Patch by Michael Liao. llvm-svn: 160129
-
NAKAMURA Takumi authored
llvm-svn: 160124
-
Benjamin Kramer authored
llvm-svn: 160120
-
Benjamin Kramer authored
The rdrand/cmov sequence is the same that is emitted by both GCC and ICC. Fixes PR13284. llvm-svn: 160117
-
Duncan Sands authored
the input vector, it can be bigger (this is helpful for powerpc where <2 x i16> is a legal vector type but i16 isn't a legal type, IIRC). However this wasn't being taken into account by ExpandRes_EXTRACT_VECTOR_ELT, causing PR13220. Lightly tweaked version of a patch by Michael Liao. llvm-svn: 160116
-
Craig Topper authored
llvm-svn: 160110
-
Manman Ren authored
It is safe if CPSR is killed or re-defined. When we are done with the basic block, check whether CPSR is live-out. Do not optimize away cmp if CPSR is live-out. llvm-svn: 160090
-
- Jul 11, 2012
-
-
Akira Hatanaka authored
llvm-svn: 160067
-
Manman Ren authored
When Movr0 is between sub and cmp, we move Movr0 before sub if it enables removal of Cmp. llvm-svn: 160066
-
Akira Hatanaka authored
llvm-svn: 160064
-
Benjamin Kramer authored
This caused 6 of 65k possible 8 bit udivs to be wrong. llvm-svn: 160058
-
Nadav Rotem authored
When ext-loading and trunc-storing vectors to memory, on x86 32bit systems, allow loads/stores of 64bit values from xmm registers. llvm-svn: 160044
-
Akira Hatanaka authored
Patch by Sasa Stankovic. llvm-svn: 160031
-
Jack Carter authored
Low order register of a double word register operand. Operands are defined by the name of the variable they are marked with in the inline assembler code. This is a way to specify that the operand just refers to the low order register for that variable. It is the opposite of modifier 'D' which specifies the high order register. Example: main() { long long ll_input = 0x1111222233334444LL; long long ll_val = 3; int i_result = 0; __asm__ __volatile__( "or %0, %L1, %2" : "=r" (i_result) : "r" (ll_input), "r" (ll_val)); } Which results in: lui $2, %hi(_gp_disp) addiu $2, $2, %lo(_gp_disp) addiu $sp, $sp, -8 addu $2, $2, $25 sw $2, 0($sp) lui $2, 13107 ori $3, $2, 17476 <-- Low 32 bits of ll_input lui $2, 4369 ori $4, $2, 8738 <-- High 32 bits of ll_input addiu $5, $zero, 3 <-- Low 32 bits of ll_val addiu $2, $zero, 0 <-- High 32 bits of ll_val #APP or $3, $4, $5 <-- or i_result, high 32 ll_input, low 32 of ll_val #NO_APP addiu $sp, $sp, 8 jr $ra If not direction is done for the long long for 32 bit variables results in using the low 32 bits as ll_val shows. There is an existing bug if 'L' or 'D' is used for the destination register for 32 bit long longs in that the target value will be updated incorrectly for the non-specified part unless explicitly set within the inline asm code. llvm-svn: 160028
-
- Jul 10, 2012
-
-
Chad Rosier authored
llvm-svn: 160006
-
Chad Rosier authored
llvm-svn: 160004
-
Chad Rosier authored
X86. Basically, this is a reapplication of r158087 with a few fixes. Specifically, (1) the stack pointer is restored from the base pointer before popping callee-saved registers and (2) in obscure cases (see comments in patch) we must cache the value of the original stack adjustment in the prologue and apply it in the epilogue. rdar://11496434 llvm-svn: 160002
-