Commits · cb98644e9ba8f44bbd07beb8ba84ab300fcdf277 · Roger Ferrer / llvm-epi-0.8

Mar 04, 2004
- make -print-machineinstrs work for both SparcV9 and X86 · 8351d8c1
  Brian Gaeke authored Mar 04, 2004
```
llvm-svn: 12122
```
  8351d8c1
- Add assertion for scale verification. · b9501c1f
  Alkis Evlogimenos authored Mar 04, 2004
```
llvm-svn: 12120
```
  b9501c1f
Mar 02, 2004
- Doxygenify some comments. · a6025e64
  Misha Brukman authored Mar 01, 2004
```
llvm-svn: 12064
```
  a6025e64
Mar 01, 2004

TargetCacheInfo has been removed; its only uses were to propagate a constant · 427cec13

Brian Gaeke authored Mar 01, 2004

(16) into certain areas of the SPARC V9 back-end. I'm fairly sure the US IIIi's
dcache has 32-byte lines, so I'm not sure where the 16 came from. However, in
the interest of not breaking things any more than they already are, I'm going
to leave the constant alone.

llvm-svn: 12043

427cec13

Handle passing constant integers to functions much more efficiently. Instead · 1f4642c4

Chris Lattner authored Mar 01, 2004

of generating this code:

        mov %EAX, 4
        mov DWORD PTR [%ESP], %EAX
        mov %AX, 123
        movsx %EAX, %AX
        mov DWORD PTR [%ESP + 4], %EAX
        call Y

we now generate:
        mov DWORD PTR [%ESP], 4
        mov DWORD PTR [%ESP + 4], 123
        call Y

Which hurts the eyes less.  :)

Considering that register pressure around call sites is already high (with all
of the callee clobber registers n stuff), this may help a lot.

llvm-svn: 12028

1f4642c4

Fix a minor code-quality issue. When passing 8 and 16-bit integer constants · 5c7d3cda

Chris Lattner authored Mar 01, 2004

to function calls, we would emit dead code, like this:

int Y(int, short, double);
int X() {
  Y(4, 123, 4);
}

--- Old
X:
        sub %ESP, 20
        mov %EAX, 4
        mov DWORD PTR [%ESP], %EAX
***     mov %AX, 123
        mov %AX, 123
        movsx %EAX, %AX
        mov DWORD PTR [%ESP + 4], %EAX
        fld QWORD PTR [.CPIX_0]
        fstp QWORD PTR [%ESP + 8]
        call Y
        mov %EAX, 0
        # IMPLICIT_USE %EAX %ESP
        add %ESP, 20
        ret

Now we emit:
X:
        sub %ESP, 20
        mov %EAX, 4
        mov DWORD PTR [%ESP], %EAX
        mov %AX, 123
        movsx %EAX, %AX
        mov DWORD PTR [%ESP + 4], %EAX
        fld QWORD PTR [.CPIX_0]
        fstp QWORD PTR [%ESP + 8]
        call Y
        mov %EAX, 0
        # IMPLICIT_USE %EAX %ESP
        add %ESP, 20
        ret

Next up, eliminate the mov AX and movsx entirely!

llvm-svn: 12026

5c7d3cda

Feb 29, 2004

Add instruction name description. · 9a4653ed
Alkis Evlogimenos authored Feb 29, 2004
```
llvm-svn: 11998
```
9a4653ed
Use correct template for SHLD and SHRD instructions so that the memory · 0824ffc6
Alkis Evlogimenos authored Feb 29, 2004
```
operand size is correctly specified.

llvm-svn: 11997
```
0824ffc6

Improve allocation order: · c7fd0770

Alkis Evlogimenos authored Feb 29, 2004

1) For 8-bit registers try to use first the ones that are parts of the
   same register (AL then AH). This way we only alias 2 16/32-bit
   registers after allocating 4 8-bit variables.

2) Move EBX as the last register to allocate. This will cause less
   spills to happen since we will have 8-bit registers available up to
   register excaustion (assuming we use the allocation order). It
   would be nice if we could push all of the 8-bit aliased registers
   towards the end but we much prefer to keep callee saved register to
   the end to avoid saving them on entry and exit of the function.

For example this gives a slight reduction of spills with linear scan
on 164.gzip.

Before:

11221 asm-printer           - Number of machine instrs printed
  975 spiller               - Number of loads added
  675 spiller               - Number of stores added
  398 spiller               - Number of register spills

After:

11182 asm-printer           - Number of machine instrs printed
  952 spiller               - Number of loads added
  652 spiller               - Number of stores added
  386 spiller               - Number of register spills

llvm-svn: 11996

c7fd0770

A big X86 instruction rename. The instructions are renamed to make · ea81b79a

Alkis Evlogimenos authored Feb 29, 2004

their names more decriptive. A name consists of the base name, a
default operand size followed by a character per operand with an
optional special size. For example:

ADD8rr -> add, 8-bit register, 8-bit register

IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate

IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate

MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory

llvm-svn: 11995

ea81b79a

Eliminate the X86-specific BMI functions, using BuildMI instead. · 1e36fb03
Chris Lattner authored Feb 29, 2004
```
Replace uses of addZImm with addImm.

llvm-svn: 11992
```
1e36fb03
Fix a miscompilation of 197.parser that occurs when you have single basic · 9a975732
Chris Lattner authored Feb 29, 2004
```
block loops.

llvm-svn: 11990
```
9a975732
Adjust to change in TII ctor arguments · ed01da8f
Chris Lattner authored Feb 29, 2004
```
llvm-svn: 11987
```
ed01da8f
These two virtual methods are never called. · ca89812d
Chris Lattner authored Feb 29, 2004
```
llvm-svn: 11984
```
ca89812d
Use correct template for ADC instruction with memory operands. · 876f6f96
Alkis Evlogimenos authored Feb 29, 2004
```
llvm-svn: 11974
```
876f6f96
SHLD and SHRD take 32-bit operands but an 8-bit immediate. Rename them · fa635805
Alkis Evlogimenos authored Feb 28, 2004
```
to denote this fact.

llvm-svn: 11972
```
fa635805
Floating point loads/stores act on memory operands. Rename them to · 4953ae08
Alkis Evlogimenos authored Feb 28, 2004
```
denote this fact.

llvm-svn: 11971
```
4953ae08

Rename instruction templates to be easier to the human eye to · c6948fa7

Alkis Evlogimenos authored Feb 28, 2004

parse. The name is now I (operand size)*. For example:

Im32 -> instruction with 32-bit memory operands.

Im16i8 -> instruction with 16-bit memory operands and 8 bit immediate
          operands.

llvm-svn: 11970

c6948fa7

Feb 28, 2004

Uncomment instructions that take both an immediate and a memory · 5b5dee4a
Alkis Evlogimenos authored Feb 28, 2004
```
operand but their sizes differ.

llvm-svn: 11969
```
5b5dee4a

Each instruction now has both an ImmType and a MemType. This describes · 19493908

Alkis Evlogimenos authored Feb 28, 2004

the size of the immediate and the memory operand on instructions that
use them. This resolves problems with instructions that take both a
memory and an immediate operand but their sizes differ (i.e. ADDmi32b).

llvm-svn: 11967

19493908

Do not generate instructions with mismatched memory/immediate sized · 2debead5
Alkis Evlogimenos authored Feb 28, 2004
```
operands. The X86 backend doesn't handle them properly right now.

llvm-svn: 11944
```
2debead5
Further comment updates. · 24b3d0bd
Alkis Evlogimenos authored Feb 28, 2004
```
llvm-svn: 11933
```
24b3d0bd
Update comments. · f87966b8
Alkis Evlogimenos authored Feb 28, 2004
```
llvm-svn: 11932
```
f87966b8

My previous commit broke the jit. The shift instructions always take · 2dbc79df

Alkis Evlogimenos authored Feb 28, 2004

an 8-bit immediate. So mark the shifts that take immediates as taking
an 8-bit argument. The rest with the implicit use of CL are marked
appropriately.

A bug still exists:

def SHLDmri32  : I2A8 <"shld", 0xA4, MRMDestMem>, TB;           // [mem32] <<= [mem32],R32 imm8

The immediate in the above instruction is 8-bit but the memory
reference is 32-bit. The printer prints this as an 8-bit reference
which confuses the assembler. Same with SHRDmri32.

llvm-svn: 11931

2dbc79df

Feb 27, 2004
- Fix argument size for SHL, SHR, SAR, SHLD and SHRD families of · b10b04c5
  Alkis Evlogimenos authored Feb 27, 2004
```
instructions.

llvm-svn: 11923
```
  b10b04c5
- Fix encoding of ADD and SUB family of instructions. Also rearrange · 75ed0f67
  Alkis Evlogimenos authored Feb 27, 2004
```
them so that they are consistent with AND, XOR, etc...

llvm-svn: 11922
```
  75ed0f67
- Rename MRMS[0-7]{r,m} to MRM[0-7]{r,m}. · 58270fcf
  Alkis Evlogimenos authored Feb 27, 2004
```
llvm-svn: 11921
```
  58270fcf
- Add memory operand folding support for the SETcc family of · 9476b7cb
  Alkis Evlogimenos authored Feb 27, 2004
```
instructions.

llvm-svn: 11907
```
  9476b7cb
- Add memory operand folding support for SHLD and SHRD instructions. · 8d99063b
  Alkis Evlogimenos authored Feb 27, 2004
```
llvm-svn: 11905
```
  8d99063b
- Add memory operand folding support for SHL, SHR and SAR, SHLD instructions. · 35374042
  Alkis Evlogimenos authored Feb 27, 2004
```
llvm-svn: 11903
```
  35374042
- Rename SHL, SHR, SAR, SHLD and SHLR instructions to make them · f020dfb4
  Alkis Evlogimenos authored Feb 27, 2004
```
consistent with the rest and also pepare for the addition of their
memory operand variants.

llvm-svn: 11902
```
  f020dfb4
Feb 26, 2004
- Uncomment assertions that register# != 0 on calls to · 61719d48
  Alkis Evlogimenos authored Feb 26, 2004
```
MRegisterInfo::is{Physical,Virtual}Register. Apply appropriate fixes
to relevant files.

llvm-svn: 11882
```
  61719d48
- Fix some warnings, some of which were spurious, and some of which were real · 9192bbda
  Chris Lattner authored Feb 26, 2004
```
bugs.  Thanks Brian!

llvm-svn: 11859
```
  9192bbda
Feb 25, 2004

Fix failures in 099.go due to the cfgsimplify pass creating switch instructions · 64c9b223
Chris Lattner authored Feb 25, 2004
```
where there did not used to be any before

llvm-svn: 11829
```
64c9b223

Teach the instruction selector how to transform 'array' GEP computations into X86 · 309327a4

Chris Lattner authored Feb 25, 2004

scaled indexes.  This allows us to compile GEP's like this:

int* %test([10 x { int, { int } }]* %X, int %Idx) {
        %Idx = cast int %Idx to long
        %X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0
        ret int* %X
}

Into a single address computation:

test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4]
        ret

Before it generated:
test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        shl %ECX, 3
        add %EAX, %ECX
        lea %EAX, DWORD PTR [%EAX + 4]
        ret

This is useful for things like int/float/double arrays, as the indexing can be folded into
the loads&stores, reducing register pressure and decreasing the pressure on the decode unit.
With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot.  On
bzip2 for example, we go from this:

10665 asm-printer           - Number of machine instrs printed
   40 ra-local              - Number of loads/stores folded into instructions
 1708 ra-local              - Number of loads added
 1532 ra-local              - Number of stores added
 1354 twoaddressinstruction - Number of instructions added
 1354 twoaddressinstruction - Number of two-address instructions
 2794 x86-peephole          - Number of peephole optimization performed

to this:
9873 asm-printer           - Number of machine instrs printed
  41 ra-local              - Number of loads/stores folded into instructions
1710 ra-local              - Number of loads added
1521 ra-local              - Number of stores added
 789 twoaddressinstruction - Number of instructions added
 789 twoaddressinstruction - Number of two-address instructions
2142 x86-peephole          - Number of peephole optimization performed

... and these types of instructions are often in tight loops.

Linear scan is also helped, but not as much.  It goes from:

8787 asm-printer           - Number of machine instrs printed
2389 liveintervals         - Number of identity moves eliminated after coalescing
2288 liveintervals         - Number of interval joins performed
3522 liveintervals         - Number of intervals after coalescing
5810 liveintervals         - Number of original intervals
 700 spiller               - Number of loads added
 487 spiller               - Number of stores added
 303 spiller               - Number of register spills
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
 363 x86-peephole          - Number of peephole optimization performed

to:

7982 asm-printer           - Number of machine instrs printed
1759 liveintervals         - Number of identity moves eliminated after coalescing
1658 liveintervals         - Number of interval joins performed
3282 liveintervals         - Number of intervals after coalescing
4940 liveintervals         - Number of original intervals
 635 spiller               - Number of loads added
 452 spiller               - Number of stores added
 288 spiller               - Number of register spills
 789 twoaddressinstruction - Number of instructions added
 789 twoaddressinstruction - Number of two-address instructions
 258 x86-peephole          - Number of peephole optimization performed

Though I'm not complaining about the drop in the number of intervals.  :)

llvm-svn: 11820

309327a4

* Make the previous patch more efficient by not allocating a temporary MachineInstr · d1ee55d4

Chris Lattner authored Feb 25, 2004

  to do analysis.

*** FOLD getelementptr instructions into loads and stores when possible,
    making use of some of the crazy X86 addressing modes.

For example, the following C++ program fragment:

struct complex {
    double re, im;
    complex(double r, double i) : re(r), im(i) {}
};
inline complex operator+(const complex& a, const complex& b) {
    return complex(a.re+b.re, a.im+b.im);
}
complex addone(const complex& arg) {
    return arg + complex(1,0);
}

Used to be compiled to:
_Z6addoneRK7complex:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
***     mov %EDX, %ECX
        fld QWORD PTR [%EDX]
        fld1
        faddp %ST(1)
***     add %ECX, 8
        fld QWORD PTR [%ECX]
        fldz
        faddp %ST(1)
***     mov %ECX, %EAX
        fxch %ST(1)
        fstp QWORD PTR [%ECX]
***     add %EAX, 8
        fstp QWORD PTR [%EAX]
        ret

Now it is compiled to:
_Z6addoneRK7complex:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        fld QWORD PTR [%ECX]
        fld1
        faddp %ST(1)
        fld QWORD PTR [%ECX + 8]
        fldz
        faddp %ST(1)
        fxch %ST(1)
        fstp QWORD PTR [%EAX]
        fstp QWORD PTR [%EAX + 8]
        ret

Other programs should see similar improvements, across the board.  Note that
in addition to reducing instruction count, this also reduces register pressure
a lot, always a good thing on X86.  :)

llvm-svn: 11819

d1ee55d4

Add a helper to create an addressing mode given all of the pieces. · 4b3514c1
Chris Lattner authored Feb 25, 2004
```
llvm-svn: 11818
```
4b3514c1

add an inefficient way of folding structure and constant array indexes together · d825d30f

Chris Lattner authored Feb 25, 2004

into a single LEA instruction.  This should improve the code generated for
things like X->A.B.C[12].D.

The bigger benefit is still coming though.  Note that this uses an LEA instruction
instead of an add, giving the register allocator more freedom.  We should probably
never generate ADDri32's.

llvm-svn: 11817

d825d30f

Implement special case for storing an immediate into memory so that we don't need · f85e33cd
Chris Lattner authored Feb 25, 2004
```
an intermediate register.

llvm-svn: 11816
```
f85e33cd

Feb 23, 2004

Refactor rewinding code for finding the first terminator of a basic · af2de484

Alkis Evlogimenos authored Feb 23, 2004

block into MachineBasicBlock::getFirstTerminator().

This also fixes a bug in the implementation of the above in both
RegAllocLocal and InstrSched, where instructions where added after the
terminator if the basic block's only instruction was a terminator (it
shouldn't matter for RegAllocLocal since this case never occurs in
practice).

llvm-svn: 11748

af2de484