- Jan 19, 2005
-
-
Chris Lattner authored
well as all of teh other stuff in livevar. This fixes the compiler crash on fourinarow last night. llvm-svn: 19695
-
Chris Lattner authored
llvm-svn: 19694
-
Chris Lattner authored
llvm-svn: 19693
-
Chris Lattner authored
typically cost 1 cycle) instead of shld/shrd instruction (which are typically 6 or more cycles). This also saves code space. For example, instead of emitting: rotr: mov %EAX, DWORD PTR [%ESP + 4] mov %CL, BYTE PTR [%ESP + 8] shrd %EAX, %EAX, %CL ret rotli: mov %EAX, DWORD PTR [%ESP + 4] shrd %EAX, %EAX, 27 ret Emit: rotr32: mov %CL, BYTE PTR [%ESP + 8] mov %EAX, DWORD PTR [%ESP + 4] ror %EAX, %CL ret rotli32: mov %EAX, DWORD PTR [%ESP + 4] ror %EAX, 27 ret We also emit byte rotate instructions which do not have a sh[lr]d counterpart at all. llvm-svn: 19692
-
Chris Lattner authored
llvm-svn: 19690
-
Chris Lattner authored
llvm-svn: 19689
-
Chris Lattner authored
llvm-svn: 19687
-
Chris Lattner authored
This allows us to generate this: foo: mov %EAX, DWORD PTR [%ESP + 4] mov %EDX, DWORD PTR [%ESP + 8] shld %EDX, %EDX, 2 shl %EAX, 2 ret instead of this: foo: mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, DWORD PTR [%ESP + 8] mov %EDX, %EAX shrd %EDX, %ECX, 30 shl %EAX, 2 ret Note the magically transmogrifying immediate. llvm-svn: 19686
-
Chris Lattner authored
instead of doing it manually. llvm-svn: 19685
-
Chris Lattner authored
Add default impl of commuteInstruction Add notes about ugly V9 code. llvm-svn: 19684
-
Chris Lattner authored
foo: mov %EAX, DWORD PTR [%ESP + 4] mov %EDX, DWORD PTR [%ESP + 8] shrd %EAX, %EDX, 2 sar %EDX, 2 ret instead of this: test1: mov %ECX, DWORD PTR [%ESP + 4] shr %ECX, 2 mov %EDX, DWORD PTR [%ESP + 8] mov %EAX, %EDX shl %EAX, 30 or %EAX, %ECX sar %EDX, 2 ret and long << 2 to this: foo: mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, DWORD PTR [%ESP + 8] *** mov %EDX, %EAX shrd %EDX, %ECX, 30 shl %EAX, 2 ret instead of this: foo: mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, %EAX shr %ECX, 30 mov %EDX, DWORD PTR [%ESP + 8] shl %EDX, 2 or %EDX, %ECX shl %EAX, 2 ret The extra copy (marked ***) can be eliminated when I teach the code generator that shrd32rri8 is really commutative. llvm-svn: 19681
-
Chris Lattner authored
select operations or to shifts that are by a constant. This automatically implements (with no special code) all of the special cases for shift by 32, shift by < 32 and shift by > 32. llvm-svn: 19679
-
Chris Lattner authored
llvm-svn: 19678
-
Chris Lattner authored
range. Either they are undefined (the default), they mask the shift amount to the size of the register (X86, Alpha, etc), or they extend the shift (PPC). This defaults to undefined, which is conservatively correct. llvm-svn: 19677
-
- Jan 18, 2005
-
-
Chris Lattner authored
llvm-svn: 19675
-
Chris Lattner authored
FP_EXTEND from! llvm-svn: 19674
-
Chris Lattner authored
llvm-svn: 19673
-
Chris Lattner authored
don't need to even think about F32 in the X86 code anymore. llvm-svn: 19672
-
Chris Lattner authored
of zero and sign extends. llvm-svn: 19671
-
Chris Lattner authored
llvm-svn: 19670
-
Chris Lattner authored
do it. This results in better code on X86 for floats (because if strict precision is not required, we can elide some more expensive double -> float conversions like the old isel did), and allows other targets to emit CopyFromRegs that are not legal for arguments. llvm-svn: 19668
-
Chris Lattner authored
llvm-svn: 19667
-
Chris Lattner authored
llvm-svn: 19661
-
Tanya Lattner authored
llvm-svn: 19660
-
Chris Lattner authored
llvm-svn: 19659
-
Chris Lattner authored
* Insert some really pedantic assertions that will notice when we emit the same loads more than one time, exposing bugs. This turns a miscompilation in bzip2 into a compile-fail. yaay. llvm-svn: 19658
-
Chris Lattner authored
llvm-svn: 19657
-
Chris Lattner authored
llvm-svn: 19656
-
Chris Lattner authored
match (X+Y)+(Z << 1), because we match the X+Y first, consuming the index register, then there is no place to put the Z. llvm-svn: 19652
-
Chris Lattner authored
llvm-svn: 19651
-
Chris Lattner authored
emitted too early. In particular, this fixes Regression/CodeGen/X86/regpressure.ll:regpressure3. This also improves the 2nd basic block in 164.gzip:flush_block, which went from .LBBflush_block_1: # loopentry.1.i movzx %EAX, WORD PTR [dyn_ltree + 20] movzx %ECX, WORD PTR [dyn_ltree + 16] mov DWORD PTR [%ESP + 32], %ECX movzx %ECX, WORD PTR [dyn_ltree + 12] movzx %EDX, WORD PTR [dyn_ltree + 8] movzx %EBX, WORD PTR [dyn_ltree + 4] mov DWORD PTR [%ESP + 36], %EBX movzx %EBX, WORD PTR [dyn_ltree] add DWORD PTR [%ESP + 36], %EBX add %EDX, DWORD PTR [%ESP + 36] add %ECX, %EDX add DWORD PTR [%ESP + 32], %ECX add %EAX, DWORD PTR [%ESP + 32] movzx %ECX, WORD PTR [dyn_ltree + 24] add %EAX, %ECX mov %ECX, 0 mov %EDX, %ECX to .LBBflush_block_1: # loopentry.1.i movzx %EAX, WORD PTR [dyn_ltree] movzx %ECX, WORD PTR [dyn_ltree + 4] add %ECX, %EAX movzx %EAX, WORD PTR [dyn_ltree + 8] add %EAX, %ECX movzx %ECX, WORD PTR [dyn_ltree + 12] add %ECX, %EAX movzx %EAX, WORD PTR [dyn_ltree + 16] add %EAX, %ECX movzx %ECX, WORD PTR [dyn_ltree + 20] add %ECX, %EAX movzx %EAX, WORD PTR [dyn_ltree + 24] add %ECX, %EAX mov %EAX, 0 mov %EDX, %EAX ... which results in less spilling in the function. This change alone speeds up 164.gzip from 37.23s to 36.24s on apoc. The default isel takes 37.31s. llvm-svn: 19650
-
Chris Lattner authored
llvm-svn: 19649
-
Chris Lattner authored
llvm-svn: 19647
-
- Jan 17, 2005
-
-
Chris Lattner authored
llvm-svn: 19645
-
Chris Lattner authored
X86/reg-pressure.ll again, and allows us to do nice things in other cases. For example, we now codegen this sort of thing: int %loadload(int *%X, int* %Y) { %Z = load int* %Y %Y = load int* %X ;; load between %Z and store %Q = add int %Z, 1 store int %Q, int* %Y ret int %Y } Into this: loadload: mov %EAX, DWORD PTR [%ESP + 4] mov %EAX, DWORD PTR [%EAX] mov %ECX, DWORD PTR [%ESP + 8] inc DWORD PTR [%ECX] ret where we weren't able to form the 'inc [mem]' before. This also lets the instruction selector emit loads in any order it wants to, which can be good for register pressure as well. llvm-svn: 19644
-
Chris Lattner authored
1. Fold [mem] += (1|-1) into inc [mem]/dec [mem] to save some icache space. 2. Do not let token factor nodes prevent forming '[mem] op= val' folds. llvm-svn: 19643
-
Chris Lattner authored
llvm-svn: 19642
-
Chris Lattner authored
llvm-svn: 19641
-
Chris Lattner authored
the basic block that uses them if possible. This is a big win on X86, as it lets us fold the argument loads into instructions and reduce register pressure (by not loading all of the arguments in the entry block). For this (contrived to show the optimization) testcase: int %argtest(int %A, int %B) { %X = sub int 12345, %A br label %L L: %Y = add int %X, %B ret int %Y } we used to produce: argtest: mov %ECX, DWORD PTR [%ESP + 4] mov %EAX, 12345 sub %EAX, %ECX mov %EDX, DWORD PTR [%ESP + 8] .LBBargtest_1: # L add %EAX, %EDX ret now we produce: argtest: mov %EAX, 12345 sub %EAX, DWORD PTR [%ESP + 4] .LBBargtest_1: # L add %EAX, DWORD PTR [%ESP + 8] ret This also fixes the FIXME in the code. BTW, this occurs in real code. 164.gzip shrinks from 8623 to 8608 lines of .s file. The stack frame in huft_build shrinks from 1644->1628 bytes, inflate_codes shrinks from 116->108 bytes, and inflate_block from 2620->2612, due to fewer spills. Take that alkis. :-) llvm-svn: 19639
-
Chris Lattner authored
operations. The body of the if is less indented but unmodified in this patch. llvm-svn: 19638
-