Commits · 535e8db0dafd8fde898be08e889aee63536ddefe · Lorenzo Albano / LLVM bpEVL

May 23, 2004
- Add some notes to myself, no functional changes · 3ef067ff
  Chris Lattner authored May 23, 2004
```
llvm-svn: 13695
```
  3ef067ff
May 14, 2004
- Generate branch machine instructions with MachineBasicBlock operands instead of · 35e73e1c
  Brian Gaeke authored May 14, 2004
```
LLVM BasicBlock operands.

llvm-svn: 13566
```
  35e73e1c
May 13, 2004

Two more improvements for null pointer handling: storing a null pointer · 8e7aea02

Chris Lattner authored May 13, 2004

and passing a null pointer into a function.

For this testcase:

void %test(int** %X) {
  store int* null, int** %X
  call void %test(int** null)
  ret void
}

we now generate this:

test:
        sub %ESP, 12
        mov %EAX, DWORD PTR [%ESP + 16]
        mov DWORD PTR [%EAX], 0
        mov DWORD PTR [%ESP], 0
        call test
        add %ESP, 12
        ret

instead of this:

test:
        sub %ESP, 12
        mov %EAX, DWORD PTR [%ESP + 16]
        mov %ECX, 0
        mov DWORD PTR [%EAX], %ECX
        mov %EAX, 0
        mov DWORD PTR [%ESP], %EAX
        call test
        add %ESP, 12
        ret

llvm-svn: 13558

8e7aea02

Second half of my fixed-sized-alloca patch. This folds the LEA to compute · 593d22d6

Chris Lattner authored May 13, 2004

the alloca address into common operations like loads/stores.

In a simple testcase like this (which is just designed to excersize the
alloca A, nothing more):

int %test(int %X, bool %C) {
        %A = alloca int
        store int %X, int* %A
        store int* %A, int** %G
        br bool %C, label %T, label %F
T:
        call int %test(int 1, bool false)
        %V = load int* %A
        ret int %V
F:
        call int %test(int 123, bool true)
        %V2 = load int* %A
        ret int %V2
}

We now generate:

test:
        sub %ESP, 12
        mov %EAX, DWORD PTR [%ESP + 16]
        mov %CL, BYTE PTR [%ESP + 20]
***     mov DWORD PTR [%ESP + 8], %EAX
        mov %EAX, OFFSET G
        lea %EDX, DWORD PTR [%ESP + 8]
        mov DWORD PTR [%EAX], %EDX
        test %CL, %CL
        je .LBB2 # PC rel: F
.LBB1:  # T
        mov DWORD PTR [%ESP], 1
        mov DWORD PTR [%ESP + 4], 0
        call test
***     mov %EAX, DWORD PTR [%ESP + 8]
        add %ESP, 12
        ret
.LBB2:  # F
        mov DWORD PTR [%ESP], 123
        mov DWORD PTR [%ESP + 4], 1
        call test
***     mov %EAX, DWORD PTR [%ESP + 8]
        add %ESP, 12
        ret

Instead of:

test:
        sub %ESP, 20
        mov %EAX, DWORD PTR [%ESP + 24]
        mov %CL, BYTE PTR [%ESP + 28]
***     lea %EDX, DWORD PTR [%ESP + 16]
***     mov DWORD PTR [%EDX], %EAX
        mov %EAX, OFFSET G
        mov DWORD PTR [%EAX], %EDX
        test %CL, %CL
***     mov DWORD PTR [%ESP + 12], %EDX
        je .LBB2 # PC rel: F
.LBB1:  # T
        mov DWORD PTR [%ESP], 1
        mov %EAX, 0
        mov DWORD PTR [%ESP + 4], %EAX
        call test
***     mov %EAX, DWORD PTR [%ESP + 12]
***     mov %EAX, DWORD PTR [%EAX]
        add %ESP, 20
        ret
.LBB2:  # F
        mov DWORD PTR [%ESP], 123
        mov %EAX, 1
        mov DWORD PTR [%ESP + 4], %EAX
        call test
***     mov %EAX, DWORD PTR [%ESP + 12]
***     mov %EAX, DWORD PTR [%EAX]
        add %ESP, 20
        ret

llvm-svn: 13557

593d22d6

Substantially improve code generation for address exposed locals (aka fixed · 2bb33259

Chris Lattner authored May 13, 2004

sized allocas in the entry block).  Instead of generating code like this:

entry:
  reg1024 = ESP+1234
... (much later)
  *reg1024 = 17


Generate code that looks like this:
entry:
  (no code generated)
... (much later)
  t = ESP+1234
  *t = 17

The advantage being that we DRAMATICALLY reduce the register pressure for these
silly temporaries (they were all being spilled to the stack, resulting in very
silly code).  This is actually a manual implementation of rematerialization :)

I have a patch to fold the alloca address computation into loads & stores, which
will make this much better still, but just getting this right took way too much time
and I'm sleepy.

llvm-svn: 13554

2bb33259

May 12, 2004

Pass boolean constants into function calls more efficiently, generating: · e2d382e1

Chris Lattner authored May 12, 2004

        mov DWORD PTR [%ESP + 4], 1

instead of:

        mov %EAX, 1
        mov DWORD PTR [%ESP + 4], %EAX

llvm-svn: 13494

e2d382e1

May 10, 2004
- Fix a fairly serious pessimizaion that was preventing us from efficiently · 72fb3256
  Chris Lattner authored May 10, 2004
```
compiling things like 'add long %X, 1'.  The problem is that we were switching
the order of the operands for longs even though we can't fold them yet.

llvm-svn: 13451
```
  72fb3256
- Fix some comments, avoid sign extending booleans when zero extend works fine · a367dd74
  Chris Lattner authored May 09, 2004
```
llvm-svn: 13440
```
  a367dd74
- Generate more efficient code for casting booleans to integers (no sign extension required) · 1542a98e
  Chris Lattner authored May 09, 2004
```
llvm-svn: 13439
```
  1542a98e
May 07, 2004

Codegen floating point stores of constants into integer instructions. This · a2dc6bf6

Chris Lattner authored May 07, 2004

allows us to compile:

store float 10.0, float* %P

into:
        mov DWORD PTR [%EAX], 1092616192

instead of:

.CPItest_0:                                     # float 0x4024000000000000
.long   1092616192      # float 10
...
        fld DWORD PTR [.CPItest_0]
        fstp DWORD PTR [%EAX]

llvm-svn: 13409

a2dc6bf6

Make comparisons against the null pointer as efficient as integer comparisons · cecf3f94

Chris Lattner authored May 07, 2004

against zero.  In particular, don't emit:

        mov %ESI, 0
        cmp %ECX, %ESI

instead, emit:

       test %ECX, %ECX

llvm-svn: 13407

cecf3f94

May 04, 2004

Remove unneeded check · c6f60131
Chris Lattner authored May 04, 2004
```
llvm-svn: 13355
```
c6f60131

Improve signed division by power of 2 *dramatically* from this: · 22df9a59

Chris Lattner authored May 04, 2004

div:
        mov %EDX, DWORD PTR [%ESP + 4]
        mov %ECX, 64
        mov %EAX, %EDX
        sar %EDX, 31
        idiv %ECX
        ret

to this:

div:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, %EAX
        sar %ECX, 5
        shr %ECX, 26
        mov %EDX, %EAX
        add %EDX, %ECX
        sar %EAX, 6
        ret

Note that the intel compiler is currently making this:

div:
        movl      4(%esp), %edx                                 #3.5
        movl      %edx, %eax                                    #4.14
        sarl      $5, %eax                                      #4.14
        shrl      $26, %eax                                     #4.14
        addl      %edx, %eax                                    #4.14
        sarl      $6, %eax                                      #4.14
        ret                                                     #4.14

Which has one less register->register copy.  (hint hint alkis :)

llvm-svn: 13354

22df9a59

Improve code generated for integer multiplications by 2,3,5,9 · 8c22ece2
Chris Lattner authored May 04, 2004
```
llvm-svn: 13342
```
8c22ece2

May 01, 2004
- Remove unused #include · 7b0a2046
  Chris Lattner authored May 01, 2004
```
llvm-svn: 13304
```
  7b0a2046
Apr 28, 2004
- Make RequiresFPRegKill() take a MachineBasicBlock arg. · 4390e4a7
  Brian Gaeke authored Apr 28, 2004
```
In InsertFPRegKills(), just check the MachineBasicBlock for successors
instead of its corresponding BasicBlock.

llvm-svn: 13213
```
  4390e4a7
- In InsertFPRegKills(), use the machine-CFG itself rather than the · 33ff1184
  Brian Gaeke authored Apr 28, 2004
```
LLVM CFG when trying to find the successors of BB.

llvm-svn: 13212
```
  33ff1184
- Update the machine-CFG edges whenever we see a branch. · 24ec8568
  Brian Gaeke authored Apr 28, 2004
```
llvm-svn: 13211
```
  24ec8568
Apr 14, 2004

Remove code to adjust the iterator for llvm.readio and llvm.writeio. · e3e2c919

John Criswell authored Apr 14, 2004

The iterator is pointing at the next instruction which should not disappear
when doing the load/store replacement.

llvm-svn: 12954

e3e2c919

Added support for the llvm.readio and llvm.writeio intrinsics. · beded72a

John Criswell authored Apr 13, 2004

On x86, memory operations occur in-order, so these are just lowered into
volatile loads and stores.

llvm-svn: 12936

beded72a

Apr 13, 2004

Implement a small optimization, which papers over the problem in · 9042e381
Chris Lattner authored Apr 13, 2004
```
X86/2004-04-13-FPCMOV-Crash.llx

A more robust fix is to follow.

llvm-svn: 12935
```
9042e381

Emit the immediate form of in/out when possible. · c71b0966

Chris Lattner authored Apr 13, 2004

Fix several bugs in the intrinsics:
  1. Make sure to copy the input registers before the instructions that use them
  2. Make sure to copy the value returned by 'in' out of EAX into the register
     it is supposed to be in.

This fixes assertions when using in/out and linear scan.

llvm-svn: 12896

c71b0966

Apr 12, 2004

Fix issues that the local allocator has dealing with instructions that implicitly use ST(0) · a24f9863
Chris Lattner authored Apr 12, 2004
```
llvm-svn: 12855
```
a24f9863

Use the fucomi[p] instructions to perform floating point comparisons instead · e407dbe9

Chris Lattner authored Apr 12, 2004

of the fucom[p][p] instructions.  This allows us to code generate this function

bool %test(double %X, double %Y) {
        %C = setlt double %Y, %X
        ret bool %C
}

... into:

test:
        fld QWORD PTR [%ESP + 4]
        fld QWORD PTR [%ESP + 12]
        fucomip %ST(1)
        fstp %ST(0)
        setb %AL
        movsx %EAX, %AL
        ret

where before we generated:

test:
        fld QWORD PTR [%ESP + 4]
        fld QWORD PTR [%ESP + 12]
        fucompp
**      fnstsw
**      sahf
        setb %AL
        movsx %EAX, %AL
        ret

The two marked instructions (which are the ones eliminated) are very bad,
because they serialize execution of the processor.  These instructions are
available on the PPRO and later, but since we already use cmov's we aren't
losing any portability.

I retained the old code for the day when we decide we want to support back
to the 386.

llvm-svn: 12852

e407dbe9

Fix a bug in my load/cast folding patch. · 0fe57da8
Chris Lattner authored Apr 12, 2004
```
llvm-svn: 12849
```
0fe57da8
Adjust some comments, fix a bug in my previous patch · dc010546
Chris Lattner authored Apr 12, 2004
```
llvm-svn: 12848
```
dc010546

On X86, casting an integer to floating point requires going through memory. · 07c1c115

Chris Lattner authored Apr 11, 2004

If the source of the cast is a load, we can just use the source memory location,
without having to create a temporary stack slot entry.

Before we code generated this:

double %int(int* %P) {
        %V = load int* %P
        %V2 = cast int %V to double
        ret double %V2
}

into:

int:
        sub %ESP, 4
        mov %EAX, DWORD PTR [%ESP + 8]
        mov %EAX, DWORD PTR [%EAX]
        mov DWORD PTR [%ESP], %EAX
        fild DWORD PTR [%ESP]
        add %ESP, 4
        ret

Now we produce this:

int:
        mov %EAX, DWORD PTR [%ESP + 4]
        fild DWORD PTR [%EAX]
        ret

... which is nicer.

llvm-svn: 12846

07c1c115

Implement folding of loads into floating point operations. This implements: · d4af820a
Chris Lattner authored Apr 11, 2004
```
test/Regression/CodeGen/X86/fp_load_fold.llx

llvm-svn: 12844
```
d4af820a

Apr 11, 2004

Unify all of the code for floating point +,-,*,/ into one function · dcb750f0
Chris Lattner authored Apr 11, 2004
```
llvm-svn: 12842
```
dcb750f0

This implements folding of constant operands into floating point operations · 80ba4016

Chris Lattner authored Apr 11, 2004

for mul and div.

Instead of generating this:

test_divr:
        fld QWORD PTR [%ESP + 4]
        fld QWORD PTR [.CPItest_divr_0]
        fdivrp %ST(1)
        ret

We now generate this:

test_divr:
        fld QWORD PTR [%ESP + 4]
        fdivr QWORD PTR [.CPItest_divr_0]
        ret

This code desperately needs refactoring, which will come in the next
patch.

llvm-svn: 12841

80ba4016

Restructure the mul/div/rem handling code to follow the pattern the other · e1efbc7c

Chris Lattner authored Apr 11, 2004

instructions use.  This doesn't change any functionality except that long
constant expressions of these operations will now magically start working.

llvm-svn: 12840

e1efbc7c

Codegen FP adds and subtracts with a constant more efficiently, generating: · f7ed7df5

Chris Lattner authored Apr 11, 2004

        fld QWORD PTR [%ESP + 4]
        fadd QWORD PTR [.CPItest_add_0]

instead of:

        fld QWORD PTR [%ESP + 4]
        fld QWORD PTR [.CPItest_add_0]
        faddp %ST(1)

I also intend to do this for mul & div, but it appears that I have to
refactor a bit of code before I can do so.

This is tested by: test/Regression/CodeGen/X86/fp_constant_op.llx

llvm-svn: 12839

f7ed7df5

Two changes: · 3f912a6f

Chris Lattner authored Apr 11, 2004

  1. If an incoming argument is dead, don't load it from the stack
  2. Do not code gen noop copies at all (ie, cast int -> uint), not even to
     a move.  This should reduce register pressure for allocators that are
     unable to coallesce away these copies in some cases.

llvm-svn: 12835

3f912a6f

Apr 10, 2004
- Silence a spurious warning · d450df05
  Chris Lattner authored Apr 10, 2004
```
llvm-svn: 12815
```
  d450df05
Apr 09, 2004

Reversed the order of the llvm.writeport() operands so that the value · 2b4c96e7
John Criswell authored Apr 09, 2004
```
is listed first and the address is listed second.

llvm-svn: 12795
```
2b4c96e7
Changed assertions to error messages. · 2fc99838
John Criswell authored Apr 09, 2004
```
llvm-svn: 12787
```
2fc99838

Changes recommended by Chris: · c28c3b62

John Criswell authored Apr 08, 2004

InstSelectSimple.cpp:
  Change the checks for proper I/O port address size into an exit() instead
  of an assertion.  Assertions aren't used in Release builds, and handling
  this error should be graceful (not that this counts as graceful, but it's
  more graceful).

  Modified the generation of the IN/OUT instructions to have 0 arguments.
X86InstrInfo.td:
  Added the OpSize attribute to the 16 bit IN and OUT instructions.

llvm-svn: 12786

c28c3b62

Apr 08, 2004

Added the llvm.readport and llvm.writeport intrinsics for x86. These do · 10db062d

John Criswell authored Apr 08, 2004

I/O port instructions on x86.  The specific code sequence is tailored to
the parameters and return value of the intrinsic call.
Added the ability for implicit defintions to be printed in the Instruction
Printer.
Added the ability for RawFrm instruction to print implict uses and
defintions with correct comma output.  This required adjustment to some
methods so that a leading comma would or would not be printed.

llvm-svn: 12782

10db062d

Apr 06, 2004

Fix PR313: [x86] JIT miscompiles unsigned short to floating point · 4b936125
Chris Lattner authored Apr 06, 2004
```
llvm-svn: 12711
```
4b936125

Fix a minor bug in previous checking · 19c8b13e

Chris Lattner authored Apr 06, 2004

Enable folding of long seteq/setne comparisons into branches and select instructions
Implement unfolded long relational comparisons against a constants a bit more efficiently

Folding comparisons changes code that looks like this:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %EDX, DWORD PTR [%ESP + 8]
        mov %ECX, %EAX
        or %ECX, %EDX
        sete %CL
        test %CL, %CL
        je .LBB2 # PC rel: F

into code that looks like this:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %EDX, DWORD PTR [%ESP + 8]
        mov %ECX, %EAX
        or %ECX, %EDX
        jne .LBB2 # PC rel: F

This speeds up 186.crafty by 6% with llc-ls.

llvm-svn: 12702

19c8b13e