- Feb 27, 2004
-
-
Alkis Evlogimenos authored
instructions. llvm-svn: 11907
-
Alkis Evlogimenos authored
llvm-svn: 11905
-
Alkis Evlogimenos authored
llvm-svn: 11903
-
Alkis Evlogimenos authored
consistent with the rest and also pepare for the addition of their memory operand variants. llvm-svn: 11902
-
- Feb 26, 2004
-
-
John Criswell authored
Functions with linkonce linkage are declared with weak linkage. Global floating point constants used to represent unprintable values (such as NaN and infinity) are declared static so that they don't interfere with other CBE generated translation units. llvm-svn: 11884
-
Alkis Evlogimenos authored
MRegisterInfo::is{Physical,Virtual}Register. Apply appropriate fixes to relevant files. llvm-svn: 11882
-
Chris Lattner authored
llvm-svn: 11875
-
Chris Lattner authored
bugs. Thanks Brian! llvm-svn: 11859
-
Misha Brukman authored
llvm-svn: 11858
-
- Feb 25, 2004
-
-
Misha Brukman authored
llvm-svn: 11835
-
Misha Brukman authored
llvm-svn: 11834
-
Misha Brukman authored
llvm-svn: 11833
-
Misha Brukman authored
llvm-svn: 11832
-
Chris Lattner authored
where there did not used to be any before llvm-svn: 11829
-
Brian Gaeke authored
llvm-svn: 11828
-
Brian Gaeke authored
llvm-svn: 11827
-
Brian Gaeke authored
llvm-svn: 11826
-
Chris Lattner authored
scaled indexes. This allows us to compile GEP's like this: int* %test([10 x { int, { int } }]* %X, int %Idx) { %Idx = cast int %Idx to long %X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0 ret int* %X } Into a single address computation: test: mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, DWORD PTR [%ESP + 8] lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4] ret Before it generated: test: mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, DWORD PTR [%ESP + 8] shl %ECX, 3 add %EAX, %ECX lea %EAX, DWORD PTR [%EAX + 4] ret This is useful for things like int/float/double arrays, as the indexing can be folded into the loads&stores, reducing register pressure and decreasing the pressure on the decode unit. With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot. On bzip2 for example, we go from this: 10665 asm-printer - Number of machine instrs printed 40 ra-local - Number of loads/stores folded into instructions 1708 ra-local - Number of loads added 1532 ra-local - Number of stores added 1354 twoaddressinstruction - Number of instructions added 1354 twoaddressinstruction - Number of two-address instructions 2794 x86-peephole - Number of peephole optimization performed to this: 9873 asm-printer - Number of machine instrs printed 41 ra-local - Number of loads/stores folded into instructions 1710 ra-local - Number of loads added 1521 ra-local - Number of stores added 789 twoaddressinstruction - Number of instructions added 789 twoaddressinstruction - Number of two-address instructions 2142 x86-peephole - Number of peephole optimization performed ... and these types of instructions are often in tight loops. Linear scan is also helped, but not as much. It goes from: 8787 asm-printer - Number of machine instrs printed 2389 liveintervals - Number of identity moves eliminated after coalescing 2288 liveintervals - Number of interval joins performed 3522 liveintervals - Number of intervals after coalescing 5810 liveintervals - Number of original intervals 700 spiller - Number of loads added 487 spiller - Number of stores added 303 spiller - Number of register spills 1354 twoaddressinstruction - Number of instructions added 1354 twoaddressinstruction - Number of two-address instructions 363 x86-peephole - Number of peephole optimization performed to: 7982 asm-printer - Number of machine instrs printed 1759 liveintervals - Number of identity moves eliminated after coalescing 1658 liveintervals - Number of interval joins performed 3282 liveintervals - Number of intervals after coalescing 4940 liveintervals - Number of original intervals 635 spiller - Number of loads added 452 spiller - Number of stores added 288 spiller - Number of register spills 789 twoaddressinstruction - Number of instructions added 789 twoaddressinstruction - Number of two-address instructions 258 x86-peephole - Number of peephole optimization performed Though I'm not complaining about the drop in the number of intervals. :) llvm-svn: 11820
-
Chris Lattner authored
to do analysis. *** FOLD getelementptr instructions into loads and stores when possible, making use of some of the crazy X86 addressing modes. For example, the following C++ program fragment: struct complex { double re, im; complex(double r, double i) : re(r), im(i) {} }; inline complex operator+(const complex& a, const complex& b) { return complex(a.re+b.re, a.im+b.im); } complex addone(const complex& arg) { return arg + complex(1,0); } Used to be compiled to: _Z6addoneRK7complex: mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, DWORD PTR [%ESP + 8] *** mov %EDX, %ECX fld QWORD PTR [%EDX] fld1 faddp %ST(1) *** add %ECX, 8 fld QWORD PTR [%ECX] fldz faddp %ST(1) *** mov %ECX, %EAX fxch %ST(1) fstp QWORD PTR [%ECX] *** add %EAX, 8 fstp QWORD PTR [%EAX] ret Now it is compiled to: _Z6addoneRK7complex: mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, DWORD PTR [%ESP + 8] fld QWORD PTR [%ECX] fld1 faddp %ST(1) fld QWORD PTR [%ECX + 8] fldz faddp %ST(1) fxch %ST(1) fstp QWORD PTR [%EAX] fstp QWORD PTR [%EAX + 8] ret Other programs should see similar improvements, across the board. Note that in addition to reducing instruction count, this also reduces register pressure a lot, always a good thing on X86. :) llvm-svn: 11819
-
Chris Lattner authored
llvm-svn: 11818
-
Chris Lattner authored
into a single LEA instruction. This should improve the code generated for things like X->A.B.C[12].D. The bigger benefit is still coming though. Note that this uses an LEA instruction instead of an add, giving the register allocator more freedom. We should probably never generate ADDri32's. llvm-svn: 11817
-
Chris Lattner authored
an intermediate register. llvm-svn: 11816
-
- Feb 24, 2004
-
-
Brian Gaeke authored
llvm-svn: 11804
-
Chris Lattner authored
longer was getting this #include, it always fell back on the less precise floating point initializer values, causing some testsuite failures. llvm-svn: 11803
-
- Feb 23, 2004
-
-
Alkis Evlogimenos authored
block into MachineBasicBlock::getFirstTerminator(). This also fixes a bug in the implementation of the above in both RegAllocLocal and InstrSched, where instructions where added after the terminator if the basic block's only instruction was a terminator (it shouldn't matter for RegAllocLocal since this case never occurs in practice). llvm-svn: 11748
-
Chris Lattner authored
block we are in might be empty llvm-svn: 11744
-
Chris Lattner authored
eventually get an assignment due to elimination of PHIs. llvm-svn: 11743
-
Chris Lattner authored
llvm-svn: 11729
-
Chris Lattner authored
llvm-svn: 11728
-
Chris Lattner authored
Implement cast Type::ULongTy -> double llvm-svn: 11726
-
Chris Lattner authored
llvm-svn: 11722
-
- Feb 22, 2004
-
-
Chris Lattner authored
use FP instructions. This reduces the number of instructions inserted in 176.gcc (for example) from 58074 to 101 (it doesn't use much FP, which is typical). This reduction speeds up the entire code generator. In the case of 176.gcc, llc went from taking 31.38s to 24.78s. The passes that sped up the most are the register allocator and the 2 live variable analysis passes, which sped up 2.3, 1.3, and 1.5s respectively. The asmprinter pass also sped up because it doesn't print the instructions in comments :) Note that this patch is likely to expose latent bugs in machine code passes, because now basicblock can be empty, where they were never empty before. I cleaned out regalloclocal, but who knows about linscan :) llvm-svn: 11717
-
Alkis Evlogimenos authored
switch statements in the constructors and simplifies the implementation of the getUseType() member function. You will have to specify defs using MachineOperand::Def instead of MOTy::Def though (similarly for Use and UseAndDef). llvm-svn: 11715
-
Chris Lattner authored
Also, make an assertion actually fireable! llvm-svn: 11713
-
Chris Lattner authored
AFTER the GEP that was emitted. :( llvm-svn: 11712
-
Chris Lattner authored
(minor) benefits right now: 1. An extra dummy MOVrr32 is gone. This move would often be coallesced by both allocators anyway. 2. The code now uses the gep_type_iterator to walk the gep, which should future proof it a bit. It still assumes that array indexes are Longs though. These don't really justify rewriting the code. The big benefit will come later though. llvm-svn: 11710
-
Alkis Evlogimenos authored
leave register operands with the same use/def flags as the original instruction. llvm-svn: 11709
-
Chris Lattner authored
this should be folded into it. llvm-svn: 11705
-
Chris Lattner authored
value is a physreg and one is a virtreg. For this reason, disable copy folding entirely for physregs. Also, use the new isMoveInstr target hook which gives us folding of FP moves as well. llvm-svn: 11700
-
- Feb 20, 2004
-
-
Chris Lattner authored
compiling 129.compress... so don't! llvm-svn: 11649
-