- Mar 26, 2009
-
-
Bill Wendling authored
llvm-svn: 67742
-
Bill Wendling authored
%a = ... %b = and i32 %a, 2 %c = srl i32 %b, 1 %d = br i32 %c, into %a = ... %b = and %a, 2 %c = X86ISD::CMP %b, 0 %d = X86ISD::BRCOND %c ... This applies only when the AND constant value has one bit set and the SRL constant is equal to the log2 of the AND constant. The back-end is smart enough to convert the result into a TEST/JMP sequence. llvm-svn: 67728
-
- Mar 13, 2009
-
-
Bill Wendling authored
instructions. Prevent that if we don't want implicit uses of SSE. llvm-svn: 66877
-
Evan Cheng authored
Fix some significant problems with constant pools that resulted in unnecessary paddings between constant pool entries, larger than necessary alignments (e.g. 8 byte alignment for .literal4 sections), and potentially other issues. 1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants. 2. MachineConstantPool alignment field is also a log2 value. 3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values. 4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries. 5. Asm printer uses expensive data structure multimap to track constant pool entries by sections. 6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic. Solutions: 1. ConstantPoolSDNode alignment field is changed to keep non-log2 value. 2. MachineConstantPool alignment field is also changed to keep non-log2 value. 3. Functions that create ConstantPool nodes are passing in non-log2 alignments. 4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT. 5. Asm printer uses cheaper data structure to group constant pool entries. 6. Asm printer compute entry offsets after grouping is done. 7. Change JIT code to compute entry offsets on the fly. llvm-svn: 66875
-
Chris Lattner authored
for i32/i64 expressions (we could also do i16 on cpus where i16 lea is fast, but I didn't add this). On the example, we now generate: _test: movl 4(%esp), %eax cmpl $42, (%eax) setl %al movzbl %al, %eax leal 4(%eax,%eax,8), %eax ret instead of: _test: movl 4(%esp), %eax cmpl $41, (%eax) movl $4, %ecx movl $13, %eax cmovg %ecx, %eax ret llvm-svn: 66869
-
Chris Lattner authored
example to: _test: movl 4(%esp), %eax cmpl $41, (%eax) setg %al movzbl %al, %eax orl $4294967294, %eax ret instead of: movl 4(%esp), %eax cmpl $41, (%eax) movl $4294967294, %ecx movl $4294967295, %eax cmova %ecx, %eax ret which is smaller in code size and faster. rdar://6668608 llvm-svn: 66868
-
- Mar 12, 2009
-
-
Chris Lattner authored
related transformations out of target-specific dag combine into the ARM backend. These were added by Evan in r37685 with no testcases and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll). Add some simple X86-specific (for now) DAG combines that turn things like cond ? 8 : 0 -> (zext(cond) << 3). This happens frequently with the recently added cp constant select optimization, but is a very general xform. For example, we now compile the second example in const-select.ll to: _test: movsd LCPI2_0, %xmm0 ucomisd 8(%esp), %xmm0 seta %al movzbl %al, %eax movl 4(%esp), %ecx movsbl (%ecx,%eax,4), %eax ret instead of: _test: movl 4(%esp), %eax leal 4(%eax), %ecx movsd LCPI2_0, %xmm0 ucomisd 8(%esp), %xmm0 cmovbe %eax, %ecx movsbl (%ecx), %eax ret This passes multisource and dejagnu. llvm-svn: 66779
-
Evan Cheng authored
On x86, if the only use of a i64 load is a i64 store, generate a pair of double load and store instead. llvm-svn: 66776
-
- Mar 11, 2009
-
-
Bill Wendling authored
floating point instructions that are explicitly specified by the user. llvm-svn: 66719
-
Mon P Wang authored
llvm-svn: 66684
-
Mon P Wang authored
llvm-svn: 66645
-
Chris Lattner authored
llvm-svn: 66642
-
- Mar 07, 2009
-
-
Dan Gohman authored
the same say the "test" instruction does in overflow cases, so eliminating the test is only safe when those bits aren't needed, as is the case for COND_E and COND_NE, or if it can be proven that no overflow will occur. For now, just restrict the optimization to COND_E and COND_NE and don't do any overflow analysis. llvm-svn: 66318
-
- Mar 05, 2009
-
-
Dan Gohman authored
The extra operand didn't appear to cause any trouble, but it was erroneous regardless. llvm-svn: 66206
-
Dan Gohman authored
negative one, as subtracts of immediates are canonicalized to adds. llvm-svn: 66180
-
- Mar 04, 2009
-
-
Dan Gohman authored
llvm-svn: 66058
-
Dan Gohman authored
llvm-svn: 66008
-
Dan Gohman authored
result from add, sub, inc, and dec instructions in simple cases. llvm-svn: 66004
-
- Feb 27, 2009
-
-
Rafael Espindola authored
pic | declaration | linkage | visibility | !pic | declaration | external | default | tls1.ll tls2.ll | local exec pic | declaration | external | default | tls1-pic.ll tls2-pic.ll | general dynamic !pic | !declaration | external | default | tls3.ll tls4.ll | initial exec pic | !declaration | external | default | tls3-pic.ll tls4-pic.ll | general dynamic !pic | declaration | external | hidden | tls7.ll tls8.ll | local exec pic | declaration | external | hidden | X | local dynamic !pic | !declaration | external | hidden | tls9.ll tls10.ll | local exec pic | !declaration | external | hidden | X | local dynamic !pic | declaration | internal | default | tls5.ll tls6.ll | local exec pic | declaration | internal | default | X | local dynamic The ones marked with an X have not been implemented since local dynamic is not implemented. llvm-svn: 65632
-
- Feb 25, 2009
-
-
Evan Cheng authored
llvm-svn: 65482
-
- Feb 23, 2009
-
-
Evan Cheng authored
llvm-svn: 65313
-
Nate Begeman authored
Generate better code for v16i8 shuffles on SSE2 (avoids stack) Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops. Document the shuffle matching logic and add some FIXMEs for later further cleanups. New tests that test the above. Examples: New: _shuf2: pextrw $7, %xmm0, %eax punpcklqdq %xmm1, %xmm0 pshuflw $128, %xmm0, %xmm0 pinsrw $2, %eax, %xmm0 Old: _shuf2: pextrw $2, %xmm0, %eax pextrw $7, %xmm0, %ecx pinsrw $2, %ecx, %xmm0 pinsrw $3, %eax, %xmm0 movd %xmm1, %eax pinsrw $4, %eax, %xmm0 ret ========= New: _shuf4: punpcklqdq %xmm1, %xmm0 pshufb LCPI1_0, %xmm0 Old: _shuf4: pextrw $3, %xmm0, %eax movsd %xmm1, %xmm0 pextrw $3, %xmm1, %ecx pinsrw $4, %ecx, %xmm0 pinsrw $5, %eax, %xmm0 ======== New: _shuf1: pushl %ebx pushl %edi pushl %esi pextrw $1, %xmm0, %eax rolw $8, %ax movd %xmm0, %ecx rolw $8, %cx pextrw $5, %xmm0, %edx pextrw $4, %xmm0, %esi pextrw $3, %xmm0, %edi pextrw $2, %xmm0, %ebx movaps %xmm0, %xmm1 pinsrw $0, %ecx, %xmm1 pinsrw $1, %eax, %xmm1 rolw $8, %bx pinsrw $2, %ebx, %xmm1 rolw $8, %di pinsrw $3, %edi, %xmm1 rolw $8, %si pinsrw $4, %esi, %xmm1 rolw $8, %dx pinsrw $5, %edx, %xmm1 pextrw $7, %xmm0, %eax rolw $8, %ax movaps %xmm1, %xmm0 pinsrw $7, %eax, %xmm0 popl %esi popl %edi popl %ebx ret Old: _shuf1: subl $252, %esp movaps %xmm0, (%esp) movaps %xmm0, 16(%esp) movaps %xmm0, 32(%esp) movaps %xmm0, 48(%esp) movaps %xmm0, 64(%esp) movaps %xmm0, 80(%esp) movaps %xmm0, 96(%esp) movaps %xmm0, 224(%esp) movaps %xmm0, 208(%esp) movaps %xmm0, 192(%esp) movaps %xmm0, 176(%esp) movaps %xmm0, 160(%esp) movaps %xmm0, 144(%esp) movaps %xmm0, 128(%esp) movaps %xmm0, 112(%esp) movzbl 14(%esp), %eax movd %eax, %xmm1 movzbl 22(%esp), %eax movd %eax, %xmm2 punpcklbw %xmm1, %xmm2 movzbl 42(%esp), %eax movd %eax, %xmm1 movzbl 50(%esp), %eax movd %eax, %xmm3 punpcklbw %xmm1, %xmm3 punpcklbw %xmm2, %xmm3 movzbl 77(%esp), %eax movd %eax, %xmm1 movzbl 84(%esp), %eax movd %eax, %xmm2 punpcklbw %xmm1, %xmm2 movzbl 104(%esp), %eax movd %eax, %xmm1 punpcklbw %xmm1, %xmm0 punpcklbw %xmm2, %xmm0 movaps %xmm0, %xmm1 punpcklbw %xmm3, %xmm1 movzbl 127(%esp), %eax movd %eax, %xmm0 movzbl 135(%esp), %eax movd %eax, %xmm2 punpcklbw %xmm0, %xmm2 movzbl 155(%esp), %eax movd %eax, %xmm0 movzbl 163(%esp), %eax movd %eax, %xmm3 punpcklbw %xmm0, %xmm3 punpcklbw %xmm2, %xmm3 movzbl 188(%esp), %eax movd %eax, %xmm0 movzbl 197(%esp), %eax movd %eax, %xmm2 punpcklbw %xmm0, %xmm2 movzbl 217(%esp), %eax movd %eax, %xmm4 movzbl 225(%esp), %eax movd %eax, %xmm0 punpcklbw %xmm4, %xmm0 punpcklbw %xmm2, %xmm0 punpcklbw %xmm3, %xmm0 punpcklbw %xmm1, %xmm0 addl $252, %esp ret llvm-svn: 65311
-
Scott Michel authored
instruction. The class also consolidates the code for detecting constant splats that's shared across PowerPC and the CellSPU backends (and might be useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for generating new BUILD_VECTOR nodes. llvm-svn: 65296
-
- Feb 22, 2009
-
-
Evan Cheng authored
llvm-svn: 65274
-
- Feb 20, 2009
-
-
Evan Cheng authored
llvm-svn: 65152
-
- Feb 17, 2009
-
-
Scott Michel authored
(Note: Eventually, commits like this will be handled via a pre-commit hook that does this automagically, as well as expand tabs to spaces and look for 80-col violations.) llvm-svn: 64827
-
- Feb 13, 2009
-
-
Evan Cheng authored
llvm-svn: 64496
-
- Feb 12, 2009
-
-
Dale Johannesen authored
in inline asm as signed (what gcc does). Add partial support for x86-specific "e" and "Z" constraints, with appropriate signedness for printing. llvm-svn: 64400
-
- Feb 07, 2009
-
-
Dale Johannesen authored
No functional change. llvm-svn: 64026
-
Dan Gohman authored
ScheduleDAG's TLI member to use const. llvm-svn: 64018
-
Dale Johannesen authored
Many targets build placeholder nodes for special operands, e.g. GlobalBaseReg on X86 and PPC for the PIC base. There's no sensible way to associate debug info with these. I've left them built with getNode calls with explicit DebugLoc::getUnknownLoc operands. I'm not too happy about this but don't see a good improvement; I considered adding a getPseudoOperand or something, but it seems to me that'll just make it harder to read. llvm-svn: 63992
-
Dale Johannesen authored
getCALLSEQ_{END,START} to permit passing no DebugLoc there. UNDEF doesn't logically have DebugLoc; add getUNDEF to encapsulate this. llvm-svn: 63978
-
- Feb 06, 2009
-
-
Dale Johannesen authored
llvm-svn: 63969
-
Dale Johannesen authored
its corresponding getTargetNode. Lots of caller changes. llvm-svn: 63904
-
- Feb 04, 2009
-
-
Dale Johannesen authored
Adjust the many callers of those versions. llvm-svn: 63767
-
Dan Gohman authored
llvm-svn: 63740
-
Mon P Wang authored
of undefs and incorrectly determining if we have punpckldq. llvm-svn: 63702
-
Dale Johannesen authored
llvm-svn: 63693
-
- Feb 03, 2009
-
-
Dale Johannesen authored
have it yet. More coming. llvm-svn: 63673
-
Dale Johannesen authored
llvm-svn: 63656
-