Commits · 9a5bd7fca7b0453a4f409ab6c0c8e3069faedee3 · Roger Ferrer / llvm-epi-0.8

Feb 25, 2004

SparcV8 skeleton · 9a5bd7fc
Brian Gaeke authored Feb 25, 2004
```
llvm-svn: 11828
```
9a5bd7fc
Great renaming part II: Sparc --> SparcV9 (also includes command-line options and Makefiles) · 068b4596
Brian Gaeke authored Feb 25, 2004
```
llvm-svn: 11827
```
068b4596
Great renaming: Sparc --> SparcV9 · 94e95d2b
Brian Gaeke authored Feb 25, 2004
```
llvm-svn: 11826
```
94e95d2b
Add a bunch more functions used by perlbmk · 864c9014
Chris Lattner authored Feb 25, 2004
```
llvm-svn: 11824
```
864c9014
Updated to use llc to generate CBE code. · 9f547bce
John Criswell authored Feb 25, 2004
```
llvm-svn: 11823
```
9f547bce
Substantial improvements and cleanups for the release notes. We were missing · 8ebf2538
Chris Lattner authored Feb 25, 2004
```
a bunch of stuff!  :)

llvm-svn: 11822
```
8ebf2538
Fix incorrect debug code · 9c6833c5
Chris Lattner authored Feb 25, 2004
```
llvm-svn: 11821
```
9c6833c5

Teach the instruction selector how to transform 'array' GEP computations into X86 · 309327a4

Chris Lattner authored Feb 25, 2004

scaled indexes.  This allows us to compile GEP's like this:

int* %test([10 x { int, { int } }]* %X, int %Idx) {
        %Idx = cast int %Idx to long
        %X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0
        ret int* %X
}

Into a single address computation:

test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4]
        ret

Before it generated:
test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        shl %ECX, 3
        add %EAX, %ECX
        lea %EAX, DWORD PTR [%EAX + 4]
        ret

This is useful for things like int/float/double arrays, as the indexing can be folded into
the loads&stores, reducing register pressure and decreasing the pressure on the decode unit.
With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot.  On
bzip2 for example, we go from this:

10665 asm-printer           - Number of machine instrs printed
   40 ra-local              - Number of loads/stores folded into instructions
 1708 ra-local              - Number of loads added
 1532 ra-local              - Number of stores added
 1354 twoaddressinstruction - Number of instructions added
 1354 twoaddressinstruction - Number of two-address instructions
 2794 x86-peephole          - Number of peephole optimization performed

to this:
9873 asm-printer           - Number of machine instrs printed
  41 ra-local              - Number of loads/stores folded into instructions
1710 ra-local              - Number of loads added
1521 ra-local              - Number of stores added
 789 twoaddressinstruction - Number of instructions added
 789 twoaddressinstruction - Number of two-address instructions
2142 x86-peephole          - Number of peephole optimization performed

... and these types of instructions are often in tight loops.

Linear scan is also helped, but not as much.  It goes from:

8787 asm-printer           - Number of machine instrs printed
2389 liveintervals         - Number of identity moves eliminated after coalescing
2288 liveintervals         - Number of interval joins performed
3522 liveintervals         - Number of intervals after coalescing
5810 liveintervals         - Number of original intervals
 700 spiller               - Number of loads added
 487 spiller               - Number of stores added
 303 spiller               - Number of register spills
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
 363 x86-peephole          - Number of peephole optimization performed

to:

7982 asm-printer           - Number of machine instrs printed
1759 liveintervals         - Number of identity moves eliminated after coalescing
1658 liveintervals         - Number of interval joins performed
3282 liveintervals         - Number of intervals after coalescing
4940 liveintervals         - Number of original intervals
 635 spiller               - Number of loads added
 452 spiller               - Number of stores added
 288 spiller               - Number of register spills
 789 twoaddressinstruction - Number of instructions added
 789 twoaddressinstruction - Number of two-address instructions
 258 x86-peephole          - Number of peephole optimization performed

Though I'm not complaining about the drop in the number of intervals.  :)

llvm-svn: 11820

309327a4

* Make the previous patch more efficient by not allocating a temporary MachineInstr · d1ee55d4

Chris Lattner authored Feb 25, 2004

  to do analysis.

*** FOLD getelementptr instructions into loads and stores when possible,
    making use of some of the crazy X86 addressing modes.

For example, the following C++ program fragment:

struct complex {
    double re, im;
    complex(double r, double i) : re(r), im(i) {}
};
inline complex operator+(const complex& a, const complex& b) {
    return complex(a.re+b.re, a.im+b.im);
}
complex addone(const complex& arg) {
    return arg + complex(1,0);
}

Used to be compiled to:
_Z6addoneRK7complex:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
***     mov %EDX, %ECX
        fld QWORD PTR [%EDX]
        fld1
        faddp %ST(1)
***     add %ECX, 8
        fld QWORD PTR [%ECX]
        fldz
        faddp %ST(1)
***     mov %ECX, %EAX
        fxch %ST(1)
        fstp QWORD PTR [%ECX]
***     add %EAX, 8
        fstp QWORD PTR [%EAX]
        ret

Now it is compiled to:
_Z6addoneRK7complex:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        fld QWORD PTR [%ECX]
        fld1
        faddp %ST(1)
        fld QWORD PTR [%ECX + 8]
        fldz
        faddp %ST(1)
        fxch %ST(1)
        fstp QWORD PTR [%EAX]
        fstp QWORD PTR [%EAX + 8]
        ret

Other programs should see similar improvements, across the board.  Note that
in addition to reducing instruction count, this also reduces register pressure
a lot, always a good thing on X86.  :)

llvm-svn: 11819

d1ee55d4

Add a helper to create an addressing mode given all of the pieces. · 4b3514c1
Chris Lattner authored Feb 25, 2004
```
llvm-svn: 11818
```
4b3514c1

add an inefficient way of folding structure and constant array indexes together · d825d30f

Chris Lattner authored Feb 25, 2004

into a single LEA instruction.  This should improve the code generated for
things like X->A.B.C[12].D.

The bigger benefit is still coming though.  Note that this uses an LEA instruction
instead of an add, giving the register allocator more freedom.  We should probably
never generate ADDri32's.

llvm-svn: 11817

d825d30f

Implement special case for storing an immediate into memory so that we don't need · f85e33cd
Chris Lattner authored Feb 25, 2004
```
an intermediate register.

llvm-svn: 11816
```
f85e33cd

Cygwin defines log2 as a macro. Undef it here IFF it has already been defined, · 04cff21c

Brian Gaeke authored Feb 25, 2004

so that we always get the inline function instead. Remember, kids, like it says
in the GCC manual, "An Inline Function is As Fast As a Macro."

llvm-svn: 11815

04cff21c

Feb 24, 2004
- small portability fix. · 01d92318
  Brian Gaeke authored Feb 24, 2004
```
llvm-svn: 11814
```
  01d92318
- Add support for 'rename' · 9ccb1af0
  Chris Lattner authored Feb 24, 2004
```
llvm-svn: 11813
```
  9ccb1af0
- Make the verifier a little more explicit about this problem. · d996e543
  Chris Lattner authored Feb 24, 2004
```
llvm-svn: 11811
```
  d996e543
- Add support for remove, fwrite, and fread · 396cdaf0
  Chris Lattner authored Feb 24, 2004
```
Also fix problem where we didn't check to see if a node pointer was null.
Though fclose(null) doesn't make a lot of sense, 300.twolf does it.

llvm-svn: 11810
```
  396cdaf0
- Added the VTune tests. · 47c5459c
  John Criswell authored Feb 24, 2004
```
llvm-svn: 11809
```
  47c5459c
- FunctionLiveVarInfo.h moved: include/llvm/CodeGen -> lib/Target/Sparc/LiveVar · 10a32da3
  Brian Gaeke authored Feb 24, 2004
```
llvm-svn: 11804
```
  10a32da3
- Fix some unexpected fallout from the config.h changes. Because the CBE no · b471f018
  Chris Lattner authored Feb 24, 2004
```
longer was getting this #include, it always fell back on the less precise
floating point initializer values, causing some testsuite failures.

llvm-svn: 11803
```
  b471f018
- Fix a faulty optimization on FP values · 8ee0593f
  Chris Lattner authored Feb 24, 2004
```
llvm-svn: 11801
```
  8ee0593f
- Fixed minor typos. · a92e5861
  John Criswell authored Feb 24, 2004
```
llvm-svn: 11800
```
  a92e5861
- If a block is made dead, make sure to promptly remove it. · 90ea78ed
  Chris Lattner authored Feb 24, 2004
```
llvm-svn: 11799
```
  90ea78ed
- Move machine code rewriter and spiller outside the register · 1dd872ce
  Alkis Evlogimenos authored Feb 24, 2004
```
allocator.

The implementation is completely rewritten and now employs several
optimizations not exercised before. For example for 164.gzip we have
997 loads and 699 stores vs the 1221 loads and 880 stores we have
before.

llvm-svn: 11798
```
  1dd872ce
- Implement SimplifyCFG/switch_switch_fold.ll · a2ab4891
  Chris Lattner authored Feb 24, 2004
```
This case occurs many times in various benchmarks, especially when combined
with the previous patch.  This allows it to get stuff like:
  if (X == 4 || X == 3)
    if (X == 5 || X == 8)

and

switch (X) {
case 4: case 5: case 6:
  if (X == 4 || X == 5)

llvm-svn: 11797
```
  a2ab4891
- New testcase. Switch instructions that go to switch instructions should be · fe7a92fe
  Chris Lattner authored Feb 24, 2004
```
merged.

llvm-svn: 11796
```
  fe7a92fe
- Add predicates for checking if a virtual register has a physical · 63aea0b6
  Alkis Evlogimenos authored Feb 24, 2004
```
register mapping or a stack slot mapping.

llvm-svn: 11795
```
  63aea0b6
- Add some helpful methods for dealing with switch instructions · c7f8ba9f
  Chris Lattner authored Feb 24, 2004
```
llvm-svn: 11794
```
  c7f8ba9f
- Rearrange code a bit · 3cd98f05
  Chris Lattner authored Feb 24, 2004
```
llvm-svn: 11793
```
  3cd98f05
- Implement: test/Regression/Transforms/SimplifyCFG/switch_create.ll · 6f4b45ac
  Chris Lattner authored Feb 24, 2004
```
This turns code like this:
  if (X == 4 | X == 7)
and
  if (X != 4 & X != 7)
into switch instructions.

llvm-svn: 11792
```
  6f4b45ac
- The simplifycfg pass should be able to turn stuff like: · ae509325
  Chris Lattner authored Feb 24, 2004
```
  if (X == 4 || X == 7)
and
  if (X != 4 && X != 7)

into switch instructions.

llvm-svn: 11791
```
  ae509325
- Wow, the description of the 'switch' instruction was out of date. · cf96c6ca
  Chris Lattner authored Feb 24, 2004
```
llvm-svn: 11790
```
  cf96c6ca
- we no longer include boost · 9f8bf00a
  Chris Lattner authored Feb 24, 2004
```
llvm-svn: 11789
```
  9f8bf00a
- Hrm, my find must have been faulty. It didn't remove these as well. · 291ebdbf
  Chris Lattner authored Feb 24, 2004
```
llvm-svn: 11788
```
  291ebdbf
- Boost is now unneeded, thanks to the fix for PR253, contributed by Reid Spencer! · 0da4862a
  Chris Lattner authored Feb 24, 2004
```
llvm-svn: 11787
```
  0da4862a
- Now that's a new feature! · 7479e1eb
  Chris Lattner authored Feb 24, 2004
```
llvm-svn: 11786
```
  7479e1eb
- Use the new LLVM is_class template instead of the boost one, allowing us to · 101e704b
  Chris Lattner authored Feb 24, 2004
```
remove our dependency on boost!  Thanks to Reid Spencer for making this possible!

llvm-svn: 11785
```
  101e704b
- Check in a new type_traits header which provides the mysterious is_class · 78eed17a
  Chris Lattner authored Feb 24, 2004
```
template.  Thanks go out to Reid Spencer for skillfully extracting this
from boost!

llvm-svn: 11784
```
  78eed17a
- Noone cares about similarity to boost · 1302e3ac
  Chris Lattner authored Feb 24, 2004
```
llvm-svn: 11783
```
  1302e3ac
- Make enum private as it is an implementation detail. · 8b571a64
  Alkis Evlogimenos authored Feb 23, 2004
```
llvm-svn: 11782
```
  8b571a64