Commits · e82c217b2fe86419e4b6883b75b74aeaa0e7fe08 · Roger Ferrer / llvm-epi-0.8

Feb 27, 2004
- Add memory operand folding support for the SETcc family of · 9476b7cb
  Alkis Evlogimenos authored Feb 27, 2004
```
instructions.

llvm-svn: 11907
```
  9476b7cb
- Add memory operand folding support for SHLD and SHRD instructions. · 8d99063b
  Alkis Evlogimenos authored Feb 27, 2004
```
llvm-svn: 11905
```
  8d99063b
- Add memory operand folding support for SHL, SHR and SAR, SHLD instructions. · 35374042
  Alkis Evlogimenos authored Feb 27, 2004
```
llvm-svn: 11903
```
  35374042
- Rename SHL, SHR, SAR, SHLD and SHLR instructions to make them · f020dfb4
  Alkis Evlogimenos authored Feb 27, 2004
```
consistent with the rest and also pepare for the addition of their
memory operand variants.

llvm-svn: 11902
```
  f020dfb4
Feb 26, 2004
- Fixes for PR258 and PR259. · feb7c49c
  John Criswell authored Feb 26, 2004
```
Functions with linkonce linkage are declared with weak linkage.
Global floating point constants used to represent unprintable values
(such as NaN and infinity) are declared static so that they don't interfere
with other CBE generated translation units.

llvm-svn: 11884
```
  feb7c49c
- Uncomment assertions that register# != 0 on calls to · 61719d48
  Alkis Evlogimenos authored Feb 26, 2004
```
MRegisterInfo::is{Physical,Virtual}Register. Apply appropriate fixes
to relevant files.

llvm-svn: 11882
```
  61719d48
- Use a map instead of annotations · 7140e469
  Chris Lattner authored Feb 26, 2004
```
llvm-svn: 11875
```
  7140e469
- Fix some warnings, some of which were spurious, and some of which were real · 9192bbda
  Chris Lattner authored Feb 26, 2004
```
bugs.  Thanks Brian!

llvm-svn: 11859
```
  9192bbda
- Instructions to call and return from functions. · 1743c409
  Misha Brukman authored Feb 26, 2004
```
llvm-svn: 11858
```
  1743c409
Feb 25, 2004

SparcV8 regs are really 32-bit, not 64! Thanks, Chris. · 564654d6
Misha Brukman authored Feb 25, 2004
```
llvm-svn: 11835
```
564654d6
Clean up the tablegen descriptions for SparcV8. · f8dcdcc8
Misha Brukman authored Feb 25, 2004
```
llvm-svn: 11834
```
f8dcdcc8
Fix the SparcV8 register definitions that were imported from PPC template. · 2122b969
Misha Brukman authored Feb 25, 2004
```
llvm-svn: 11833
```
2122b969
SparcV8 has different types of instructions, but F1 is only used for CALL. · 0e3a7ca5
Misha Brukman authored Feb 25, 2004
```
llvm-svn: 11832
```
0e3a7ca5
Fix failures in 099.go due to the cfgsimplify pass creating switch instructions · 64c9b223
Chris Lattner authored Feb 25, 2004
```
where there did not used to be any before

llvm-svn: 11829
```
64c9b223
SparcV8 skeleton · 9a5bd7fc
Brian Gaeke authored Feb 25, 2004
```
llvm-svn: 11828
```
9a5bd7fc
Great renaming part II: Sparc --> SparcV9 (also includes command-line options and Makefiles) · 068b4596
Brian Gaeke authored Feb 25, 2004
```
llvm-svn: 11827
```
068b4596
Great renaming: Sparc --> SparcV9 · 94e95d2b
Brian Gaeke authored Feb 25, 2004
```
llvm-svn: 11826
```
94e95d2b

Teach the instruction selector how to transform 'array' GEP computations into X86 · 309327a4

Chris Lattner authored Feb 25, 2004

scaled indexes.  This allows us to compile GEP's like this:

int* %test([10 x { int, { int } }]* %X, int %Idx) {
        %Idx = cast int %Idx to long
        %X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0
        ret int* %X
}

Into a single address computation:

test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4]
        ret

Before it generated:
test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        shl %ECX, 3
        add %EAX, %ECX
        lea %EAX, DWORD PTR [%EAX + 4]
        ret

This is useful for things like int/float/double arrays, as the indexing can be folded into
the loads&stores, reducing register pressure and decreasing the pressure on the decode unit.
With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot.  On
bzip2 for example, we go from this:

10665 asm-printer           - Number of machine instrs printed
   40 ra-local              - Number of loads/stores folded into instructions
 1708 ra-local              - Number of loads added
 1532 ra-local              - Number of stores added
 1354 twoaddressinstruction - Number of instructions added
 1354 twoaddressinstruction - Number of two-address instructions
 2794 x86-peephole          - Number of peephole optimization performed

to this:
9873 asm-printer           - Number of machine instrs printed
  41 ra-local              - Number of loads/stores folded into instructions
1710 ra-local              - Number of loads added
1521 ra-local              - Number of stores added
 789 twoaddressinstruction - Number of instructions added
 789 twoaddressinstruction - Number of two-address instructions
2142 x86-peephole          - Number of peephole optimization performed

... and these types of instructions are often in tight loops.

Linear scan is also helped, but not as much.  It goes from:

8787 asm-printer           - Number of machine instrs printed
2389 liveintervals         - Number of identity moves eliminated after coalescing
2288 liveintervals         - Number of interval joins performed
3522 liveintervals         - Number of intervals after coalescing
5810 liveintervals         - Number of original intervals
 700 spiller               - Number of loads added
 487 spiller               - Number of stores added
 303 spiller               - Number of register spills
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
 363 x86-peephole          - Number of peephole optimization performed

to:

7982 asm-printer           - Number of machine instrs printed
1759 liveintervals         - Number of identity moves eliminated after coalescing
1658 liveintervals         - Number of interval joins performed
3282 liveintervals         - Number of intervals after coalescing
4940 liveintervals         - Number of original intervals
 635 spiller               - Number of loads added
 452 spiller               - Number of stores added
 288 spiller               - Number of register spills
 789 twoaddressinstruction - Number of instructions added
 789 twoaddressinstruction - Number of two-address instructions
 258 x86-peephole          - Number of peephole optimization performed

Though I'm not complaining about the drop in the number of intervals.  :)

llvm-svn: 11820

309327a4

* Make the previous patch more efficient by not allocating a temporary MachineInstr · d1ee55d4

Chris Lattner authored Feb 25, 2004

  to do analysis.

*** FOLD getelementptr instructions into loads and stores when possible,
    making use of some of the crazy X86 addressing modes.

For example, the following C++ program fragment:

struct complex {
    double re, im;
    complex(double r, double i) : re(r), im(i) {}
};
inline complex operator+(const complex& a, const complex& b) {
    return complex(a.re+b.re, a.im+b.im);
}
complex addone(const complex& arg) {
    return arg + complex(1,0);
}

Used to be compiled to:
_Z6addoneRK7complex:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
***     mov %EDX, %ECX
        fld QWORD PTR [%EDX]
        fld1
        faddp %ST(1)
***     add %ECX, 8
        fld QWORD PTR [%ECX]
        fldz
        faddp %ST(1)
***     mov %ECX, %EAX
        fxch %ST(1)
        fstp QWORD PTR [%ECX]
***     add %EAX, 8
        fstp QWORD PTR [%EAX]
        ret

Now it is compiled to:
_Z6addoneRK7complex:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        fld QWORD PTR [%ECX]
        fld1
        faddp %ST(1)
        fld QWORD PTR [%ECX + 8]
        fldz
        faddp %ST(1)
        fxch %ST(1)
        fstp QWORD PTR [%EAX]
        fstp QWORD PTR [%EAX + 8]
        ret

Other programs should see similar improvements, across the board.  Note that
in addition to reducing instruction count, this also reduces register pressure
a lot, always a good thing on X86.  :)

llvm-svn: 11819

d1ee55d4

Add a helper to create an addressing mode given all of the pieces. · 4b3514c1
Chris Lattner authored Feb 25, 2004
```
llvm-svn: 11818
```
4b3514c1

add an inefficient way of folding structure and constant array indexes together · d825d30f

Chris Lattner authored Feb 25, 2004

into a single LEA instruction.  This should improve the code generated for
things like X->A.B.C[12].D.

The bigger benefit is still coming though.  Note that this uses an LEA instruction
instead of an add, giving the register allocator more freedom.  We should probably
never generate ADDri32's.

llvm-svn: 11817

d825d30f

Implement special case for storing an immediate into memory so that we don't need · f85e33cd
Chris Lattner authored Feb 25, 2004
```
an intermediate register.

llvm-svn: 11816
```
f85e33cd

Feb 24, 2004
- FunctionLiveVarInfo.h moved: include/llvm/CodeGen -> lib/Target/Sparc/LiveVar · 10a32da3
  Brian Gaeke authored Feb 24, 2004
```
llvm-svn: 11804
```
  10a32da3
- Fix some unexpected fallout from the config.h changes. Because the CBE no · b471f018
  Chris Lattner authored Feb 24, 2004
```
longer was getting this #include, it always fell back on the less precise
floating point initializer values, causing some testsuite failures.

llvm-svn: 11803
```
  b471f018
Feb 23, 2004
- Refactor rewinding code for finding the first terminator of a basic · af2de484
  Alkis Evlogimenos authored Feb 23, 2004
```
block into MachineBasicBlock::getFirstTerminator().

This also fixes a bug in the implementation of the above in both
RegAllocLocal and InstrSched, where instructions where added after the
terminator if the basic block's only instruction was a terminator (it
shouldn't matter for RegAllocLocal since this case never occurs in
practice).

llvm-svn: 11748
```
  af2de484
- Simplify code a bit, don't go off the end of the block, now that the current · cb185a34
  Chris Lattner authored Feb 23, 2004
```
block we are in might be empty

llvm-svn: 11744
```
  cb185a34
- We were forgetting to add FP_REG_KILL instructions to basic blocks which will · 4ffd4443
  Chris Lattner authored Feb 23, 2004
```
eventually get an assignment due to elimination of PHIs.

llvm-svn: 11743
```
  4ffd4443
- Work around a gas bug. Print '-9223372036854775808' as unsigned. · abb91629
  Chris Lattner authored Feb 23, 2004
```
llvm-svn: 11729
```
  abb91629
- Implement cast fp -> bool · 7e90628a
  Chris Lattner authored Feb 23, 2004
```
llvm-svn: 11728
```
  7e90628a
- Stop passing iterators around by reference now that we have ilists! · 6590c299
  Chris Lattner authored Feb 23, 2004
```
Implement cast Type::ULongTy -> double

llvm-svn: 11726
```
  6590c299
- Add a new cmove instruction · 378157c3
  Chris Lattner authored Feb 23, 2004
```
llvm-svn: 11722
```
  378157c3
Feb 22, 2004

Only insert FP_REG_KILL instructions in MachineBasicBlocks that actually · cdd56634

Chris Lattner authored Feb 22, 2004

use FP instructions. This reduces the number of instructions inserted in
176.gcc (for example) from 58074 to 101 (it doesn't use much FP, which
is typical). This reduction speeds up the entire code generator. In the
case of 176.gcc, llc went from taking 31.38s to 24.78s. The passes that
sped up the most are the register allocator and the 2 live variable analysis
passes, which sped up 2.3, 1.3, and 1.5s respectively. The asmprinter
pass also sped up because it doesn't print the instructions in comments :)

Note that this patch is likely to expose latent bugs in machine code passes,
because now basicblock can be empty, where they were never empty before. I
cleaned out regalloclocal, but who knows about linscan :)

llvm-svn: 11717

cdd56634

Move MOTy::UseType enum into MachineOperand. This eliminates the · 8358cc57

Alkis Evlogimenos authored Feb 22, 2004

switch statements in the constructors and simplifies the
implementation of the getUseType() member function. You will have to
specify defs using MachineOperand::Def instead of MOTy::Def though
(similarly for Use and UseAndDef).

llvm-svn: 11715

8358cc57

Reduce the number of pointless copies inserted due to constant pointer refs. · fae75640
Chris Lattner authored Feb 22, 2004
```
Also, make an assertion actually fireable!

llvm-svn: 11713
```
fae75640
Fix bug in previous checkout: leave the iterator at the first instruction · fa3ebd6a
Chris Lattner authored Feb 22, 2004
```
AFTER the GEP that was emitted.  :(

llvm-svn: 11712
```
fa3ebd6a

Completely rewrite how getelementptr instructions are expanded. This has two · 6536519f

Chris Lattner authored Feb 22, 2004

(minor) benefits right now:

1. An extra dummy MOVrr32 is gone.  This move would often be coallesced by
   both allocators anyway.
2. The code now uses the gep_type_iterator to walk the gep, which should future
   proof it a bit.  It still assumes that array indexes are Longs though.

These don't really justify rewriting the code.  The big benefit will come later
though.

llvm-svn: 11710

6536519f

When folding memory operands in machine instructions be careful to · de51c652
Alkis Evlogimenos authored Feb 22, 2004
```
leave register operands with the same use/def flags as the original
instruction.

llvm-svn: 11709
```
de51c652
Wow this is out of date. When we have _real_ code generator documentation, · 5fc6ae2b
Chris Lattner authored Feb 22, 2004
```
this should be folded into it.

llvm-svn: 11705
```
5fc6ae2b

The two address pass cannot handle two addr instructions where one incoming · 87d72eb2

Chris Lattner authored Feb 22, 2004

value is a physreg and one is a virtreg.  For this reason, disable copy folding
entirely for physregs.  Also, use the new isMoveInstr target hook which gives us
folding of FP moves as well.

llvm-svn: 11700

87d72eb2

Feb 20, 2004
- It is totally unacceptable to print out (literally) millions of zeros when · 73ffc88a
  Chris Lattner authored Feb 20, 2004
```
compiling 129.compress... so don't!

llvm-svn: 11649
```
  73ffc88a