- Nov 16, 2004
-
-
Chris Lattner authored
hold your nose!) llvm-svn: 17869
-
Chris Lattner authored
already been emitted, we don't have to remember it and deal with it later, just emit it directly. llvm-svn: 17868
-
Chris Lattner authored
* Get rid of "emitMaybePCRelativeValue", either we want to emit a PC relative value or not: drop the maybe BS. As it turns out, the only places where the bool was a variable coming in, the bool was a dynamic constant. llvm-svn: 17867
-
Chris Lattner authored
set up. llvm-svn: 17862
-
Chris Lattner authored
llvm-svn: 17861
-
- Nov 14, 2004
-
-
Misha Brukman authored
llvm-svn: 17750
-
Chris Lattner authored
llvm-svn: 17714
-
- Nov 13, 2004
-
-
Chris Lattner authored
shld is a very high latency operation. Instead of emitting it for shifts of two or three, open code the equivalent operation which is faster on athlon and P4 (by a substantial margin). For example, instead of compiling this: long long X2(long long Y) { return Y << 2; } to: X3_2: movl 4(%esp), %eax movl 8(%esp), %edx shldl $2, %eax, %edx shll $2, %eax ret Compile it to: X2: movl 4(%esp), %eax movl 8(%esp), %ecx movl %eax, %edx shrl $30, %edx leal (%edx,%ecx,4), %edx shll $2, %eax ret Likewise, for << 3, compile to: X3: movl 4(%esp), %eax movl 8(%esp), %ecx movl %eax, %edx shrl $29, %edx leal (%edx,%ecx,8), %edx shll $3, %eax ret This matches icc, except that icc open codes the shifts as adds on the P4. llvm-svn: 17707
-
Chris Lattner authored
llvm-svn: 17706
-
Chris Lattner authored
long long X3_2(long long Y) { return Y+Y; } int X(int Y) { return Y+Y; } into: X3_2: movl 4(%esp), %eax movl 8(%esp), %edx addl %eax, %eax adcl %edx, %edx ret X: movl 4(%esp), %eax addl %eax, %eax ret instead of: X3_2: movl 4(%esp), %eax movl 8(%esp), %edx shldl $1, %eax, %edx shll $1, %eax ret X: movl 4(%esp), %eax shll $1, %eax ret llvm-svn: 17705
-
- Nov 10, 2004
-
-
John Criswell authored
It's stosl (l for long == 32 bit). llvm-svn: 17658
-
- Nov 05, 2004
-
-
John Criswell authored
llvm-svn: 17488
-
Chris Lattner authored
llvm-svn: 17484
-
- Nov 02, 2004
-
-
Chris Lattner authored
llvm-svn: 17431
-
- Nov 01, 2004
-
-
Chris Lattner authored
llvm-svn: 17406
-
- Oct 28, 2004
-
-
Reid Spencer authored
llvm-svn: 17286
-
- Oct 22, 2004
-
-
Reid Spencer authored
llvm-svn: 17167
-
Reid Spencer authored
llvm-svn: 17155
-
- Oct 19, 2004
-
-
Reid Spencer authored
llvm-svn: 17136
-
- Oct 18, 2004
-
-
Chris Lattner authored
llvm-svn: 17126
-
- Oct 17, 2004
-
-
Chris Lattner authored
last night. :) bork! llvm-svn: 17093
-
Chris Lattner authored
double %test(uint %X) { %tmp.1 = cast uint %X to double ; <double> [#uses=1] ret double %tmp.1 } into: test: sub %ESP, 8 mov %EAX, DWORD PTR [%ESP + 12] mov %ECX, 0 mov DWORD PTR [%ESP], %EAX mov DWORD PTR [%ESP + 4], %ECX fild QWORD PTR [%ESP] add %ESP, 8 ret ... which basically zero extends to 8 bytes, then does an fild for an 8-byte signed int. Now we generate this: test: sub %ESP, 4 mov %EAX, DWORD PTR [%ESP + 8] mov DWORD PTR [%ESP], %EAX fild DWORD PTR [%ESP] shr %EAX, 31 fadd DWORD PTR [.CPItest_0 + 4*%EAX] add %ESP, 4 ret .section .rodata .align 4 .CPItest_0: .quad 5728578726015270912 This does a 32-bit signed integer load, then adds in an offset if the sign bit of the integer was set. It turns out that this is substantially faster than the preceeding sequence. Consider this testcase: unsigned a[2]={1,2}; volatile double G; void main() { int i; for (i=0; i<100000000; ++i ) G += a[i&1]; } On zion (a P4 Xeon, 3Ghz), this patch speeds up the testcase from 2.140s to 0.94s. On apoc, an athlon MP 2100+, this patch speeds up the testcase from 1.72s to 1.34s. Note that the program takes 2.5s/1.97s on zion/apoc with GCC 3.3 -O3 -fomit-frame-pointer. llvm-svn: 17083
-
Chris Lattner authored
us to use index registers for CPI's llvm-svn: 17082
-
Chris Lattner authored
index reg and scale llvm-svn: 17081
-
Chris Lattner authored
%X = and Y, constantint %Z = setcc %X, 0 instead of emitting: and %EAX, 3 test %EAX, %EAX je .LBBfoo2_2 # UnifiedReturnBlock We now emit: test %EAX, 3 je .LBBfoo2_2 # UnifiedReturnBlock This triggers 581 times on 176.gcc for example. llvm-svn: 17080
-
- Oct 16, 2004
-
-
Chris Lattner authored
now compile: 'foo() {}' into "ret" instead of "mov EAX, 0; ret" llvm-svn: 17049
-
- Oct 15, 2004
-
-
Chris Lattner authored
case: int C[100]; int foo() { return C[4]; } We now codegen: foo: mov %EAX, DWORD PTR [C + 16] ret instead of: foo: mov %EAX, OFFSET C mov %EAX, DWORD PTR [%EAX + 16] ret Other impressive features may be coming later. This patch is contributed by Jeff Cohen! llvm-svn: 17011
-
Chris Lattner authored
contributed by Jeff Cohen! llvm-svn: 17010
-
Chris Lattner authored
constant displacements from global variables. Patch by Jeff Cohen! llvm-svn: 17009
-
Chris Lattner authored
by Jeff Cohen! llvm-svn: 17008
-
- Oct 13, 2004
-
-
Reid Spencer authored
llvm-svn: 16950
-
- Oct 11, 2004
-
-
Reid Spencer authored
llvm-svn: 16893
-
- Oct 09, 2004
-
-
Chris Lattner authored
the -sse* options (to avoid misleading people). Also, the stack alignment of the target doesn't depend on whether SSE is eventually implemented, so remove a comment. llvm-svn: 16860
-
Chris Lattner authored
which prevented setcc's from being folded into branches. It appears that conditional branchinst's CC operand is actually operand(2), not operand(0) as we might expect. :( llvm-svn: 16859
-
- Oct 08, 2004
-
-
Chris Lattner authored
instcombine xform, which is why we didn't notice it before. llvm-svn: 16840
-
- Oct 06, 2004
-
-
Chris Lattner authored
the JIT had last night. llvm-svn: 16766
-
Chris Lattner authored
t: mov %EDX, DWORD PTR [%ESP + 4] mov %ECX, 2 mov %EAX, %EDX sar %EDX, 31 idiv %ECX mov %EAX, %EDX ret Generate: t: mov %ECX, DWORD PTR [%ESP + 4] *** mov %EAX, %ECX cdq and %ECX, 1 xor %ECX, %EDX sub %ECX, %EDX *** mov %EAX, %ECX ret Note that the two marked moves are redundant, and should be eliminated by the register allocator, but aren't. Compare this to GCC, which generates: t: mov %eax, DWORD PTR [%esp+4] mov %edx, %eax shr %edx, 31 lea %ecx, [%edx+%eax] and %ecx, -2 sub %eax, %ecx ret or ICC 8.0, which generates: t: movl 4(%esp), %ecx #3.5 movl $-2147483647, %eax #3.25 imull %ecx #3.25 movl %ecx, %eax #3.25 sarl $31, %eax #3.25 addl %ecx, %edx #3.25 subl %edx, %eax #3.25 addl %eax, %eax #3.25 negl %eax #3.25 subl %eax, %ecx #3.25 movl %ecx, %eax #3.25 ret #3.25 We would be in great shape if not for the moves. llvm-svn: 16763
-
Chris Lattner authored
s: ;; X / 4 mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, %EAX sar %ECX, 1 shr %ECX, 30 mov %EDX, %EAX add %EDX, %ECX sar %EAX, 2 ret When we really meant: s: mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, %EAX sar %ECX, 1 shr %ECX, 30 add %EAX, %ECX sar %EAX, 2 ret Hey, this also reduces register pressure too :) llvm-svn: 16761
-
Chris Lattner authored
instead of: s: ;; X / 2 movl 4(%esp), %eax movl %eax, %ecx shrl $31, %ecx movl %eax, %edx addl %ecx, %edx sarl $1, %eax ret t: ;; X / -2 movl 4(%esp), %eax movl %eax, %ecx shrl $31, %ecx movl %eax, %edx addl %ecx, %edx sarl $1, %eax negl %eax ret Emit: s: movl 4(%esp), %eax cmpl $-2147483648, %eax sbbl $-1, %eax sarl $1, %eax ret t: movl 4(%esp), %eax cmpl $-2147483648, %eax sbbl $-1, %eax sarl $1, %eax negl %eax ret llvm-svn: 16760
-
Chris Lattner authored
llvm-svn: 16759
-