Commits · 61719d48d2cec6fbfcd68a4a7aabf12a0a681b38 · Roger Ferrer / llvm-epi-0.8

Feb 26, 2004

Uncomment assertions that register# != 0 on calls to · 61719d48
Alkis Evlogimenos authored Feb 26, 2004
```
MRegisterInfo::is{Physical,Virtual}Register. Apply appropriate fixes
to relevant files.

llvm-svn: 11882
```
61719d48
Since LLVM uses structure type equivalence, it isn't useful to keep around · 79636d7c
Chris Lattner authored Feb 26, 2004
```
multiple type names for the same structural type.  Make DTE eliminate all
but one of the type names

llvm-svn: 11879
```
79636d7c
Use a map instead of annotations · 7140e469
Chris Lattner authored Feb 26, 2004
```
llvm-svn: 11875
```
7140e469
remove obsolete comment · 234a2d4f
Chris Lattner authored Feb 26, 2004
```
llvm-svn: 11872
```
234a2d4f
Make sure that at least one virtual method is defined in a .cpp file to avoid · 12003589
Chris Lattner authored Feb 26, 2004
```
having the compiler emit RTTI and vtables to EVERY translation unit.

llvm-svn: 11871
```
12003589

Chris Lattner authored Feb 26, 2004

   if (X == 0 || X == 2)

...where the comparisons and branches are in different blocks... into a switch
instruction.  This comes up a lot in various programs, and works well with
the switch/switch merging code I checked earlier.  For example, this testcase:

int switchtest(int C) {
  return C == 0 ? f(123) :
         C == 1 ? f(3123) :
         C == 4 ? f(312) :
         C == 5 ? f(1234): f(444);
}

is converted into this:
        switch int %C, label %cond_false.3 [
                 int 0, label %cond_true.0
                 int 1, label %cond_true.1
                 int 4, label %cond_true.2
                 int 5, label %cond_true.3
        ]

instead of a whole bunch of conditional branches.

Admittedly the code is ugly, and incomplete.  To be complete, we need to add
br -> switch merging and switch -> br merging.  For example, this testcase:

struct foo { int Q, R, Z; };
#define A (X->Q+X->R * 123)
int test(struct foo *X) {
  return A  == 123 ? X1() :
        A == 12321 ? X2():
        (A == 111 || A == 222) ? X3() :
        A == 875 ? X4() : X5();
}

Gets compiled to this:
        switch int %tmp.7, label %cond_false.2 [
                 int 123, label %cond_true.0
                 int 12321, label %cond_true.1
                 int 111, label %cond_true.2
                 int 222, label %cond_true.2
        ]
...
cond_false.2:           ; preds = %entry
        %tmp.52 = seteq int %tmp.7, 875         ; <bool> [#uses=1]
        br bool %tmp.52, label %cond_true.3, label %cond_false.3

where the branch could be folded into the switch.

This kind of thing occurs *ALL OF THE TIME*, especially in programs like
176.gcc, which is a horrible mess of code.  It contains stuff like *shudder*:

#define SWITCH_TAKES_ARG(CHAR) \
  (   (CHAR) == 'D' \
   || (CHAR) == 'U' \
   || (CHAR) == 'o' \
   || (CHAR) == 'e' \
   || (CHAR) == 'u' \
   || (CHAR) == 'I' \
   || (CHAR) == 'm' \
   || (CHAR) == 'L' \
   || (CHAR) == 'A' \
   || (CHAR) == 'h' \
   || (CHAR) == 'z')

and

#define CONST_OK_FOR_LETTER_P(VALUE, C)                 \
  ((C) == 'I' ? SMALL_INTVAL (VALUE)                    \
   : (C) == 'J' ? SMALL_INTVAL (-(VALUE))               \
   : (C) == 'K' ? (unsigned)(VALUE) < 32                \
   : (C) == 'L' ? ((VALUE) & 0xffff) == 0               \
   : (C) == 'M' ? integer_ok_for_set (VALUE)            \
   : (C) == 'N' ? (VALUE) < 0                           \
   : (C) == 'O' ? (VALUE) == 0                          \
   : (C) == 'P' ? (VALUE) >= 0                          \
   : 0)

and

#define LEGITIMIZE_ADDRESS(X,OLDX,MODE,WIN)                     \
{                                                               \
  if (GET_CODE (X) == PLUS && CONSTANT_ADDRESS_P (XEXP (X, 1))) \
    (X) = gen_rtx (PLUS, SImode, XEXP (X, 0),                   \
                   copy_to_mode_reg (SImode, XEXP (X, 1)));     \
  if (GET_CODE (X) == PLUS && CONSTANT_ADDRESS_P (XEXP (X, 0))) \
    (X) = gen_rtx (PLUS, SImode, XEXP (X, 1),                   \
                   copy_to_mode_reg (SImode, XEXP (X, 0)));     \
  if (GET_CODE (X) == PLUS && GET_CODE (XEXP (X, 0)) == MULT)   \
    (X) = gen_rtx (PLUS, SImode, XEXP (X, 1),                   \
                   force_operand (XEXP (X, 0), 0));             \
  if (GET_CODE (X) == PLUS && GET_CODE (XEXP (X, 1)) == MULT)   \
    (X) = gen_rtx (PLUS, SImode, XEXP (X, 0),                   \
                   force_operand (XEXP (X, 1), 0));             \
  if (GET_CODE (X) == PLUS && GET_CODE (XEXP (X, 0)) == PLUS)   \
    (X) = gen_rtx (PLUS, Pmode, force_operand (XEXP (X, 0), NULL_RTX),\
                   XEXP (X, 1));                                \
  if (GET_CODE (X) == PLUS && GET_CODE (XEXP (X, 1)) == PLUS)   \
    (X) = gen_rtx (PLUS, Pmode, XEXP (X, 0),                    \
                   force_operand (XEXP (X, 1), NULL_RTX));      \
  if (GET_CODE (X) == SYMBOL_REF || GET_CODE (X) == CONST       \
           || GET_CODE (X) == LABEL_REF)                        \
    (X) = legitimize_address (flag_pic, X, 0, 0);               \
  if (memory_address_p (MODE, X))                               \
    goto WIN; }

and others.  These macros get used multiple times of course.  These are such
lovely candidates for macros, aren't they?  :)

This code also nicely handles LLVM constructs that look like this:

  if (isa<CastInst>(I))
   ...
  else if (isa<BranchInst>(I))
   ...
  else if (isa<SetCondInst>(I))
   ...
  else if (isa<UnwindInst>(I))
   ...
  else if (isa<VAArgInst>(I))
   ...

where the isa can obviously be a dyn_cast as well.  Switch instructions are a
good thing.

llvm-svn: 11870

21e941fb

No need to clear the map here, it will always be empty · 28a08859
Chris Lattner authored Feb 26, 2004
```
llvm-svn: 11868
```
28a08859
Fix typo · 36ab728f
Chris Lattner authored Feb 26, 2004
```
llvm-svn: 11864
```
36ab728f
The node doesn't have to be _no_ node flags, it just has to be complete and · 128e8419
Chris Lattner authored Feb 26, 2004
```
not have any globals.

llvm-svn: 11863
```
128e8419
Add _more_ functions · c8167b0e
Chris Lattner authored Feb 26, 2004
```
llvm-svn: 11862
```
c8167b0e
Fix some warnings, some of which were spurious, and some of which were real · 9192bbda
Chris Lattner authored Feb 26, 2004
```
bugs.  Thanks Brian!

llvm-svn: 11859
```
9192bbda
Instructions to call and return from functions. · 1743c409
Misha Brukman authored Feb 26, 2004
```
llvm-svn: 11858
```
1743c409

Two changes: · 71626b8f

Chris Lattner authored Feb 25, 2004

 1. Functions do not make things incomplete, only variables
 2. Constant global variables no longer need to be marked incomplete, because
    we are guaranteed that the initializer for the global will be in the
    graph we are hacking on now.  This makes resolution of indirect calls happen
    a lot more in the bu pass, supports things like vtables and the C counterparts
    (giant constant arrays of function pointers), etc...

Testcase here: test/Regression/Analysis/DSGraph/constant_globals.ll

llvm-svn: 11852

71626b8f

When building local graphs, clone the initializer for constant globals into each · fab2872b
Chris Lattner authored Feb 25, 2004
```
local graph that uses the global.

llvm-svn: 11850
```
fab2872b
Fix bugs found with recent addition of assertions in · e62ddd40
Alkis Evlogimenos authored Feb 25, 2004
```
MRegisterInfo::is{Physical,Virtual}Register.

llvm-svn: 11849
```
e62ddd40

Simplify the dead node elimination stuff · 6ce59b4a

Chris Lattner authored Feb 25, 2004

Make the incompleteness marker faster by looping directly over the globals
instead of over the scalars to find the globals

Fix a bug where we didn't mark a global incomplete if it didn't have any
outgoing edges.  This wouldn't break any current clients but is still wrong.

llvm-svn: 11848

6ce59b4a

Add a bunch more functions · 5e5e0606
Chris Lattner authored Feb 25, 2004
```
llvm-svn: 11847
```
5e5e0606
Try harder to get symbol info · 17bce881
Chris Lattner authored Feb 25, 2004
```
llvm-svn: 11846
```
17bce881

Represent va_list in interpreter as a (ec-stack-depth . var-arg-index) · 7b4be13f

Brian Gaeke authored Feb 25, 2004

pair, and look up varargs in the execution stack every time, instead of
just pushing iterators (which can be invalidated during callFunction())
around.  (union GenericValue now has a "pair of uints" member, to support
this mechanism.) Fixes Bug 234.

llvm-svn: 11845

7b4be13f

Feb 25, 2004

Great sparc renaming fallout IV: Sparc --> SparcV9. · 84b76c9b
Brian Gaeke authored Feb 25, 2004
```
llvm-svn: 11844
```
84b76c9b
Remove asssert since it is breaking cases that it shouldn't. · a9f03fba
Alkis Evlogimenos authored Feb 25, 2004
```
llvm-svn: 11841
```
a9f03fba
Add DenseMap template and actually use it for for mapping virtual regs · d8bace7f
Alkis Evlogimenos authored Feb 25, 2004
```
to objects.

llvm-svn: 11840
```
d8bace7f

My faith in programmers has been found to be totally misplaced. One would · 8d1da1ab

Chris Lattner authored Feb 25, 2004

assume that if they don't intend to write to a global variable, that they
would mark it as constant.  However, there are people that don't understand
that the compiler can do nice things for them if they give it the information
it needs.

This pass looks for blatently obvious globals that are only ever read from.
Though it uses a trivially simple "alias analysis" of sorts, it is still able
to do amazing things to important benchmarks.  253.perlbmk, for example,
contains several ***GIANT*** function pointer tables that are not marked
constant and should be.  Marking them constant allows the optimizer to turn
a whole bunch of indirect calls into direct calls.  Note that only a link-time
optimizer can do this transformation, but perlbmk does have several strings
and other minor globals that can be marked constant by this pass when run
from GCCAS.

176.gcc has a ton of strings and large tables that are marked constant, both
at compile time (38 of them) and at link time (48 more).  Other benchmarks
give similar results, though it seems like big ones have disproportionally
more than small ones.

This pass is extremely quick and does good things.  I'm going to enable it
in gccas & gccld.  Not bad for 50 SLOC.

llvm-svn: 11836

8d1da1ab

SparcV8 regs are really 32-bit, not 64! Thanks, Chris. · 564654d6
Misha Brukman authored Feb 25, 2004
```
llvm-svn: 11835
```
564654d6
Clean up the tablegen descriptions for SparcV8. · f8dcdcc8
Misha Brukman authored Feb 25, 2004
```
llvm-svn: 11834
```
f8dcdcc8
Fix the SparcV8 register definitions that were imported from PPC template. · 2122b969
Misha Brukman authored Feb 25, 2004
```
llvm-svn: 11833
```
2122b969
SparcV8 has different types of instructions, but F1 is only used for CALL. · 0e3a7ca5
Misha Brukman authored Feb 25, 2004
```
llvm-svn: 11832
```
0e3a7ca5
Add an assertion · f5a393a1
Chris Lattner authored Feb 25, 2004
```
llvm-svn: 11830
```
f5a393a1
Fix failures in 099.go due to the cfgsimplify pass creating switch instructions · 64c9b223
Chris Lattner authored Feb 25, 2004
```
where there did not used to be any before

llvm-svn: 11829
```
64c9b223
SparcV8 skeleton · 9a5bd7fc
Brian Gaeke authored Feb 25, 2004
```
llvm-svn: 11828
```
9a5bd7fc
Great renaming part II: Sparc --> SparcV9 (also includes command-line options and Makefiles) · 068b4596
Brian Gaeke authored Feb 25, 2004
```
llvm-svn: 11827
```
068b4596
Great renaming: Sparc --> SparcV9 · 94e95d2b
Brian Gaeke authored Feb 25, 2004
```
llvm-svn: 11826
```
94e95d2b
Add a bunch more functions used by perlbmk · 864c9014
Chris Lattner authored Feb 25, 2004
```
llvm-svn: 11824
```
864c9014
Fix incorrect debug code · 9c6833c5
Chris Lattner authored Feb 25, 2004
```
llvm-svn: 11821
```
9c6833c5

Teach the instruction selector how to transform 'array' GEP computations into X86 · 309327a4

Chris Lattner authored Feb 25, 2004

scaled indexes.  This allows us to compile GEP's like this:

int* %test([10 x { int, { int } }]* %X, int %Idx) {
        %Idx = cast int %Idx to long
        %X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0
        ret int* %X
}

Into a single address computation:

test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4]
        ret

Before it generated:
test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        shl %ECX, 3
        add %EAX, %ECX
        lea %EAX, DWORD PTR [%EAX + 4]
        ret

This is useful for things like int/float/double arrays, as the indexing can be folded into
the loads&stores, reducing register pressure and decreasing the pressure on the decode unit.
With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot.  On
bzip2 for example, we go from this:

10665 asm-printer           - Number of machine instrs printed
   40 ra-local              - Number of loads/stores folded into instructions
 1708 ra-local              - Number of loads added
 1532 ra-local              - Number of stores added
 1354 twoaddressinstruction - Number of instructions added
 1354 twoaddressinstruction - Number of two-address instructions
 2794 x86-peephole          - Number of peephole optimization performed

to this:
9873 asm-printer           - Number of machine instrs printed
  41 ra-local              - Number of loads/stores folded into instructions
1710 ra-local              - Number of loads added
1521 ra-local              - Number of stores added
 789 twoaddressinstruction - Number of instructions added
 789 twoaddressinstruction - Number of two-address instructions
2142 x86-peephole          - Number of peephole optimization performed

... and these types of instructions are often in tight loops.

Linear scan is also helped, but not as much.  It goes from:

8787 asm-printer           - Number of machine instrs printed
2389 liveintervals         - Number of identity moves eliminated after coalescing
2288 liveintervals         - Number of interval joins performed
3522 liveintervals         - Number of intervals after coalescing
5810 liveintervals         - Number of original intervals
 700 spiller               - Number of loads added
 487 spiller               - Number of stores added
 303 spiller               - Number of register spills
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
 363 x86-peephole          - Number of peephole optimization performed

to:

7982 asm-printer           - Number of machine instrs printed
1759 liveintervals         - Number of identity moves eliminated after coalescing
1658 liveintervals         - Number of interval joins performed
3282 liveintervals         - Number of intervals after coalescing
4940 liveintervals         - Number of original intervals
 635 spiller               - Number of loads added
 452 spiller               - Number of stores added
 288 spiller               - Number of register spills
 789 twoaddressinstruction - Number of instructions added
 789 twoaddressinstruction - Number of two-address instructions
 258 x86-peephole          - Number of peephole optimization performed

Though I'm not complaining about the drop in the number of intervals.  :)

llvm-svn: 11820

309327a4

* Make the previous patch more efficient by not allocating a temporary MachineInstr · d1ee55d4

Chris Lattner authored Feb 25, 2004

  to do analysis.

*** FOLD getelementptr instructions into loads and stores when possible,
    making use of some of the crazy X86 addressing modes.

For example, the following C++ program fragment:

struct complex {
    double re, im;
    complex(double r, double i) : re(r), im(i) {}
};
inline complex operator+(const complex& a, const complex& b) {
    return complex(a.re+b.re, a.im+b.im);
}
complex addone(const complex& arg) {
    return arg + complex(1,0);
}

Used to be compiled to:
_Z6addoneRK7complex:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
***     mov %EDX, %ECX
        fld QWORD PTR [%EDX]
        fld1
        faddp %ST(1)
***     add %ECX, 8
        fld QWORD PTR [%ECX]
        fldz
        faddp %ST(1)
***     mov %ECX, %EAX
        fxch %ST(1)
        fstp QWORD PTR [%ECX]
***     add %EAX, 8
        fstp QWORD PTR [%EAX]
        ret

Now it is compiled to:
_Z6addoneRK7complex:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        fld QWORD PTR [%ECX]
        fld1
        faddp %ST(1)
        fld QWORD PTR [%ECX + 8]
        fldz
        faddp %ST(1)
        fxch %ST(1)
        fstp QWORD PTR [%EAX]
        fstp QWORD PTR [%EAX + 8]
        ret

Other programs should see similar improvements, across the board.  Note that
in addition to reducing instruction count, this also reduces register pressure
a lot, always a good thing on X86.  :)

llvm-svn: 11819

d1ee55d4

Add a helper to create an addressing mode given all of the pieces. · 4b3514c1
Chris Lattner authored Feb 25, 2004
```
llvm-svn: 11818
```
4b3514c1

add an inefficient way of folding structure and constant array indexes together · d825d30f

Chris Lattner authored Feb 25, 2004

into a single LEA instruction.  This should improve the code generated for
things like X->A.B.C[12].D.

The bigger benefit is still coming though.  Note that this uses an LEA instruction
instead of an add, giving the register allocator more freedom.  We should probably
never generate ADDri32's.

llvm-svn: 11817

d825d30f

Implement special case for storing an immediate into memory so that we don't need · f85e33cd
Chris Lattner authored Feb 25, 2004
```
an intermediate register.

llvm-svn: 11816
```
f85e33cd

Feb 24, 2004
- Add support for 'rename' · 9ccb1af0
  Chris Lattner authored Feb 24, 2004
```
llvm-svn: 11813
```
  9ccb1af0