- Feb 23, 2009
-
-
Nate Begeman authored
Generate better code for v16i8 shuffles on SSE2 (avoids stack) Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops. Document the shuffle matching logic and add some FIXMEs for later further cleanups. New tests that test the above. Examples: New: _shuf2: pextrw $7, %xmm0, %eax punpcklqdq %xmm1, %xmm0 pshuflw $128, %xmm0, %xmm0 pinsrw $2, %eax, %xmm0 Old: _shuf2: pextrw $2, %xmm0, %eax pextrw $7, %xmm0, %ecx pinsrw $2, %ecx, %xmm0 pinsrw $3, %eax, %xmm0 movd %xmm1, %eax pinsrw $4, %eax, %xmm0 ret ========= New: _shuf4: punpcklqdq %xmm1, %xmm0 pshufb LCPI1_0, %xmm0 Old: _shuf4: pextrw $3, %xmm0, %eax movsd %xmm1, %xmm0 pextrw $3, %xmm1, %ecx pinsrw $4, %ecx, %xmm0 pinsrw $5, %eax, %xmm0 ======== New: _shuf1: pushl %ebx pushl %edi pushl %esi pextrw $1, %xmm0, %eax rolw $8, %ax movd %xmm0, %ecx rolw $8, %cx pextrw $5, %xmm0, %edx pextrw $4, %xmm0, %esi pextrw $3, %xmm0, %edi pextrw $2, %xmm0, %ebx movaps %xmm0, %xmm1 pinsrw $0, %ecx, %xmm1 pinsrw $1, %eax, %xmm1 rolw $8, %bx pinsrw $2, %ebx, %xmm1 rolw $8, %di pinsrw $3, %edi, %xmm1 rolw $8, %si pinsrw $4, %esi, %xmm1 rolw $8, %dx pinsrw $5, %edx, %xmm1 pextrw $7, %xmm0, %eax rolw $8, %ax movaps %xmm1, %xmm0 pinsrw $7, %eax, %xmm0 popl %esi popl %edi popl %ebx ret Old: _shuf1: subl $252, %esp movaps %xmm0, (%esp) movaps %xmm0, 16(%esp) movaps %xmm0, 32(%esp) movaps %xmm0, 48(%esp) movaps %xmm0, 64(%esp) movaps %xmm0, 80(%esp) movaps %xmm0, 96(%esp) movaps %xmm0, 224(%esp) movaps %xmm0, 208(%esp) movaps %xmm0, 192(%esp) movaps %xmm0, 176(%esp) movaps %xmm0, 160(%esp) movaps %xmm0, 144(%esp) movaps %xmm0, 128(%esp) movaps %xmm0, 112(%esp) movzbl 14(%esp), %eax movd %eax, %xmm1 movzbl 22(%esp), %eax movd %eax, %xmm2 punpcklbw %xmm1, %xmm2 movzbl 42(%esp), %eax movd %eax, %xmm1 movzbl 50(%esp), %eax movd %eax, %xmm3 punpcklbw %xmm1, %xmm3 punpcklbw %xmm2, %xmm3 movzbl 77(%esp), %eax movd %eax, %xmm1 movzbl 84(%esp), %eax movd %eax, %xmm2 punpcklbw %xmm1, %xmm2 movzbl 104(%esp), %eax movd %eax, %xmm1 punpcklbw %xmm1, %xmm0 punpcklbw %xmm2, %xmm0 movaps %xmm0, %xmm1 punpcklbw %xmm3, %xmm1 movzbl 127(%esp), %eax movd %eax, %xmm0 movzbl 135(%esp), %eax movd %eax, %xmm2 punpcklbw %xmm0, %xmm2 movzbl 155(%esp), %eax movd %eax, %xmm0 movzbl 163(%esp), %eax movd %eax, %xmm3 punpcklbw %xmm0, %xmm3 punpcklbw %xmm2, %xmm3 movzbl 188(%esp), %eax movd %eax, %xmm0 movzbl 197(%esp), %eax movd %eax, %xmm2 punpcklbw %xmm0, %xmm2 movzbl 217(%esp), %eax movd %eax, %xmm4 movzbl 225(%esp), %eax movd %eax, %xmm0 punpcklbw %xmm4, %xmm0 punpcklbw %xmm2, %xmm0 punpcklbw %xmm3, %xmm0 punpcklbw %xmm1, %xmm0 addl $252, %esp ret llvm-svn: 65311
-
Scott Michel authored
instruction. The class also consolidates the code for detecting constant splats that's shared across PowerPC and the CellSPU backends (and might be useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for generating new BUILD_VECTOR nodes. llvm-svn: 65296
-
- Feb 22, 2009
-
-
Evan Cheng authored
llvm-svn: 65274
-
- Feb 20, 2009
-
-
Evan Cheng authored
llvm-svn: 65152
-
- Feb 17, 2009
-
-
Scott Michel authored
(Note: Eventually, commits like this will be handled via a pre-commit hook that does this automagically, as well as expand tabs to spaces and look for 80-col violations.) llvm-svn: 64827
-
- Feb 13, 2009
-
-
Evan Cheng authored
llvm-svn: 64496
-
- Feb 12, 2009
-
-
Dale Johannesen authored
in inline asm as signed (what gcc does). Add partial support for x86-specific "e" and "Z" constraints, with appropriate signedness for printing. llvm-svn: 64400
-
- Feb 07, 2009
-
-
Dale Johannesen authored
No functional change. llvm-svn: 64026
-
Dan Gohman authored
ScheduleDAG's TLI member to use const. llvm-svn: 64018
-
Dale Johannesen authored
Many targets build placeholder nodes for special operands, e.g. GlobalBaseReg on X86 and PPC for the PIC base. There's no sensible way to associate debug info with these. I've left them built with getNode calls with explicit DebugLoc::getUnknownLoc operands. I'm not too happy about this but don't see a good improvement; I considered adding a getPseudoOperand or something, but it seems to me that'll just make it harder to read. llvm-svn: 63992
-
Dale Johannesen authored
getCALLSEQ_{END,START} to permit passing no DebugLoc there. UNDEF doesn't logically have DebugLoc; add getUNDEF to encapsulate this. llvm-svn: 63978
-
- Feb 06, 2009
-
-
Dale Johannesen authored
llvm-svn: 63969
-
Dale Johannesen authored
its corresponding getTargetNode. Lots of caller changes. llvm-svn: 63904
-
- Feb 04, 2009
-
-
Dale Johannesen authored
Adjust the many callers of those versions. llvm-svn: 63767
-
Dan Gohman authored
llvm-svn: 63740
-
Mon P Wang authored
of undefs and incorrectly determining if we have punpckldq. llvm-svn: 63702
-
Dale Johannesen authored
llvm-svn: 63693
-
- Feb 03, 2009
-
-
Dale Johannesen authored
have it yet. More coming. llvm-svn: 63673
-
Dale Johannesen authored
llvm-svn: 63656
-
Dale Johannesen authored
llvm-svn: 63650
-
- Feb 02, 2009
-
-
Evan Cheng authored
llvm-svn: 63509
-
Evan Cheng authored
llvm-svn: 63506
-
Evan Cheng authored
Teach LowerBRCOND to recognize (xor (setcc x), 1). The xor inverts the condition. It's normally transformed by the dag combiner, unless the condition is set by a arithmetic op with overflow. llvm-svn: 63505
-
- Feb 01, 2009
-
-
Torok Edwin authored
var-args, and don't allow FP return values llvm-svn: 63495
-
Duncan Sands authored
crashes or wrong code with codegen of large integers: eliminate the legacy getIntegerVTBitMask and getIntegerVTSignBit methods, which returned their value as a uint64_t, so couldn't handle huge types. llvm-svn: 63494
-
- Jan 31, 2009
-
-
Dale Johannesen authored
argument. Adjust all callers and overloaded versions. llvm-svn: 63444
-
Bill Wendling authored
llvm-svn: 63442
-
- Jan 30, 2009
-
-
Mon P Wang authored
an illegal type. llvm-svn: 63380
-
- Jan 29, 2009
-
-
Dan Gohman authored
dagcombines that help it match in several more cases. Add several more cases to test/CodeGen/X86/bt.ll. This doesn't yet include matching for BT with an immediate operand, it just covers more register+register cases. llvm-svn: 63266
-
Mon P Wang authored
llvm-svn: 63252
-
- Jan 28, 2009
-
-
Mon P Wang authored
llvm-svn: 63193
-
- Jan 26, 2009
-
-
Dan Gohman authored
tidy up SDUse and related code. - Replace the operator= member functions with a set method, like LLVM Use has, and variants setInitial and setNode, which take care up updating use lists, like LLVM Use's does. This simplifies code that calls these functions. - getSDValue() is renamed to get(), as in LLVM Use, though most places can either use the implicit conversion to SDValue or the convenience functions instead. - Fix some more node vs. value terminology issues. Also, eliminate the one remaining use of SDOperandPtr, and SDOperandPtr itself. llvm-svn: 62995
-
Nate Begeman authored
llvm-svn: 62988
-
Nate Begeman authored
llvm-svn: 62979
-
- Jan 22, 2009
-
-
Bob Wilson authored
corresponding to the "not" and "vnot" PatFrags. Use the new method in some places where it seems appropriate. llvm-svn: 62768
-
- Jan 19, 2009
-
-
Evan Cheng authored
Minor tweak to LowerUINT_TO_FP_i32. Bias (after scalar_to_vector) has two uses so we should make it the second source operand of ISD::OR so 2-address pass won't have to be smart about commuting. %reg1024<def> = MOVSDrm %reg0, 1, %reg0, <cp#0>, Mem:LD(8,8) [ConstantPool + 0] %reg1025<def> = MOVSD2PDrr %reg1024 %reg1026<def> = MOVDI2PDIrm <fi#-1>, 1, %reg0, 0, Mem:LD(4,16) [FixedStack-1 + 0] %reg1027<def> = ORPSrr %reg1025<kill>, %reg1026<kill> %reg1028<def> = MOVPD2SDrr %reg1027<kill> %reg1029<def> = SUBSDrr %reg1028<kill>, %reg1024<kill> %reg1030<def> = CVTSD2SSrr %reg1029<kill> MOVSSmr <fi#0>, 1, %reg0, 0, %reg1030<kill>, Mem:ST(4,4) [FixedStack0 + 0] %reg1031<def> = LD_Fp32m80 <fi#0>, 1, %reg0, 0, Mem:LD(4,16) [FixedStack0 + 0] RET %reg1031<kill>, %ST0<imp-use,kill> The reason 2-addr pass isn't smart enough to commute the ORPSrr is because it can't look pass the MOVSD2PDrr instruction. llvm-svn: 62505
-
Evan Cheng authored
optimize it to a SINT_TO_FP when the sign bit is known zero. X86 isel should perform the optimization itself. llvm-svn: 62504
-
- Jan 17, 2009
-
-
Bill Wendling authored
llvm-svn: 62415
-
Bill Wendling authored
llvm-svn: 62405
-
Bill Wendling authored
X86. This code: void f() { uint32_t x; float y = (float)x; } used to be: movl %eax, -8(%ebp) movl [2^52 double], -4(%ebp) movsd -8(%ebp), %xmm0 subsd [2^52 double], %xmm0 cvtsd2ss %xmm0, %xmm0 Is now: movsd [2^52 double], %xmm0 movsd %xmm0, %xmm1 movd %ecx, %xmm2 orps %xmm2, %xmm1 subsd %xmm0, %xmm1 cvtsd2ss %xmm1, %xmm0 This is faster on X86. Note that there's an extra load of %xmm0 into %xmm1. That will be fixed in a later coalescer fix. llvm-svn: 62404
-