Commits · 80ba4016023e35bb1569065fbd47088df8fd0528 · Lorenzo Albano / LLVM bpEVL

Apr 11, 2004

This implements folding of constant operands into floating point operations · 80ba4016

Chris Lattner authored Apr 11, 2004

for mul and div.

Instead of generating this:

test_divr:
        fld QWORD PTR [%ESP + 4]
        fld QWORD PTR [.CPItest_divr_0]
        fdivrp %ST(1)
        ret

We now generate this:

test_divr:
        fld QWORD PTR [%ESP + 4]
        fdivr QWORD PTR [.CPItest_divr_0]
        ret

This code desperately needs refactoring, which will come in the next
patch.

llvm-svn: 12841

80ba4016

Restructure the mul/div/rem handling code to follow the pattern the other · e1efbc7c

Chris Lattner authored Apr 11, 2004

instructions use.  This doesn't change any functionality except that long
constant expressions of these operations will now magically start working.

llvm-svn: 12840

e1efbc7c

Codegen FP adds and subtracts with a constant more efficiently, generating: · f7ed7df5

Chris Lattner authored Apr 11, 2004

        fld QWORD PTR [%ESP + 4]
        fadd QWORD PTR [.CPItest_add_0]

instead of:

        fld QWORD PTR [%ESP + 4]
        fld QWORD PTR [.CPItest_add_0]
        faddp %ST(1)

I also intend to do this for mul & div, but it appears that I have to
refactor a bit of code before I can do so.

This is tested by: test/Regression/CodeGen/X86/fp_constant_op.llx

llvm-svn: 12839

f7ed7df5

Two changes: · 3f912a6f

Chris Lattner authored Apr 11, 2004

  1. If an incoming argument is dead, don't load it from the stack
  2. Do not code gen noop copies at all (ie, cast int -> uint), not even to
     a move.  This should reduce register pressure for allocators that are
     unable to coallesce away these copies in some cases.

llvm-svn: 12835

3f912a6f

Apr 10, 2004
- Silence a spurious warning · d450df05
  Chris Lattner authored Apr 10, 2004
```
llvm-svn: 12815
```
  d450df05
Apr 09, 2004

Reversed the order of the llvm.writeport() operands so that the value · 2b4c96e7
John Criswell authored Apr 09, 2004
```
is listed first and the address is listed second.

llvm-svn: 12795
```
2b4c96e7
Changed assertions to error messages. · 2fc99838
John Criswell authored Apr 09, 2004
```
llvm-svn: 12787
```
2fc99838

Changes recommended by Chris: · c28c3b62

John Criswell authored Apr 08, 2004

InstSelectSimple.cpp:
  Change the checks for proper I/O port address size into an exit() instead
  of an assertion.  Assertions aren't used in Release builds, and handling
  this error should be graceful (not that this counts as graceful, but it's
  more graceful).

  Modified the generation of the IN/OUT instructions to have 0 arguments.
X86InstrInfo.td:
  Added the OpSize attribute to the 16 bit IN and OUT instructions.

llvm-svn: 12786

c28c3b62

Apr 08, 2004

Added the llvm.readport and llvm.writeport intrinsics for x86. These do · 10db062d

John Criswell authored Apr 08, 2004

I/O port instructions on x86.  The specific code sequence is tailored to
the parameters and return value of the intrinsic call.
Added the ability for implicit defintions to be printed in the Instruction
Printer.
Added the ability for RawFrm instruction to print implict uses and
defintions with correct comma output.  This required adjustment to some
methods so that a leading comma would or would not be printed.

llvm-svn: 12782

10db062d

Apr 06, 2004

Fix PR313: [x86] JIT miscompiles unsigned short to floating point · 4b936125
Chris Lattner authored Apr 06, 2004
```
llvm-svn: 12711
```
4b936125

Fix a minor bug in previous checking · 19c8b13e

Chris Lattner authored Apr 06, 2004

Enable folding of long seteq/setne comparisons into branches and select instructions
Implement unfolded long relational comparisons against a constants a bit more efficiently

Folding comparisons changes code that looks like this:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %EDX, DWORD PTR [%ESP + 8]
        mov %ECX, %EAX
        or %ECX, %EDX
        sete %CL
        test %CL, %CL
        je .LBB2 # PC rel: F

into code that looks like this:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %EDX, DWORD PTR [%ESP + 8]
        mov %ECX, %EAX
        or %ECX, %EDX
        jne .LBB2 # PC rel: F

This speeds up 186.crafty by 6% with llc-ls.

llvm-svn: 12702

19c8b13e

Improve codegen of long == and != comparisons against constants. Before, · f2ee88eb

Chris Lattner authored Apr 06, 2004

comparing a long against zero got us this:

        sub %ESP, 8
        mov DWORD PTR [%ESP + 4], %ESI
        mov DWORD PTR [%ESP], %EDI
        mov %EAX, DWORD PTR [%ESP + 12]
        mov %EDX, DWORD PTR [%ESP + 16]
        mov %ECX, 0
        mov %ESI, 0
        mov %EDI, %EAX
        xor %EDI, %ECX
        mov %ECX, %EDX
        xor %ECX, %ESI
        or %EDI, %ECX
        sete %CL
        test %CL, %CL
        je .LBB2 # PC rel: F

Now it gets us this:

        mov %EAX, DWORD PTR [%ESP + 4]
        mov %EDX, DWORD PTR [%ESP + 8]
        mov %ECX, %EAX
        or %ECX, %EDX
        sete %CL
        test %CL, %CL
        je .LBB2 # PC rel: F

llvm-svn: 12696

f2ee88eb

Handle various other important cases of multiplying a long constant immediate. For · 6c3bf13f

Chris Lattner authored Apr 06, 2004

example, multiplying X*(1 + (1LL << 32)) now produces:

test:
        mov %ECX, DWORD PTR [%ESP + 4]
        mov %EDX, DWORD PTR [%ESP + 8]
        mov %EAX, %ECX
        add %EDX, %ECX
        ret

[[[Note to Alkis: why isn't linear scan generating this code??  This might be a
 problem with your intervals being too conservative:

test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %EDX, DWORD PTR [%ESP + 8]
        add %EDX, %EAX
        ret

end note]]]

Whereas GCC produces this:

T:
        sub     %esp, 12
        mov     %edx, DWORD PTR [%esp+16]
        mov     DWORD PTR [%esp+8], %edi
        mov     %ecx, DWORD PTR [%esp+20]
        xor     %edi, %edi
        mov     DWORD PTR [%esp], %ebx
        mov     %ebx, %edi
        mov     %eax, %edx
        mov     DWORD PTR [%esp+4], %esi
        add     %ebx, %edx
        mov     %edi, DWORD PTR [%esp+8]
        lea     %edx, [%ecx+%ebx]
        mov     %esi, DWORD PTR [%esp+4]
        mov     %ebx, DWORD PTR [%esp]
        add     %esp, 12
        ret

I'm not sure example what GCC is smoking here, but it looks like it has just
confused itself with a bunch of stack slots or something.  The intel compiler
is better, but still not good:

T:
        movl      4(%esp), %edx                                 #2.11
        movl      8(%esp), %eax                                 #2.11
        lea       (%eax,%edx), %ecx                             #3.12
        movl      $1, %eax                                      #3.12
        mull      %edx                                          #3.12
        addl      %ecx, %edx                                    #3.12
        ret                                                     #3.12

llvm-svn: 12693

6c3bf13f

Efficiently handle a long multiplication by a constant. For this testcase: · 1f6024cb

Chris Lattner authored Apr 06, 2004

long %test(long %X) {
        %Y = mul long %X, 123
        ret long %Y
}

we used to generate:

test:
        sub %ESP, 12
        mov DWORD PTR [%ESP + 8], %ESI
        mov DWORD PTR [%ESP + 4], %EDI
        mov DWORD PTR [%ESP], %EBX
        mov %ECX, DWORD PTR [%ESP + 16]
        mov %ESI, DWORD PTR [%ESP + 20]
        mov %EDI, 123
        mov %EBX, 0
        mov %EAX, %ECX
        mul %EDI
        imul %ESI, %EDI
        add %ESI, %EDX
        imul %ECX, %EBX
        add %ESI, %ECX
        mov %EDX, %ESI
        mov %EBX, DWORD PTR [%ESP]
        mov %EDI, DWORD PTR [%ESP + 4]
        mov %ESI, DWORD PTR [%ESP + 8]
        add %ESP, 12
        ret

Now we emit:
test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        mov %EDX, 123
        mul %EDX
        imul %ECX, %ECX, 123
        add %ECX, %EDX
        mov %EDX, %ECX
        ret

Which, incidently, is substantially nicer than what GCC manages:
T:
        sub     %esp, 8
        mov     %eax, 123
        mov     DWORD PTR [%esp], %ebx
        mov     %ebx, DWORD PTR [%esp+16]
        mov     DWORD PTR [%esp+4], %esi
        mov     %esi, DWORD PTR [%esp+12]
        imul    %ecx, %ebx, 123
        mov     %ebx, DWORD PTR [%esp]
        mul     %esi
        mov     %esi, DWORD PTR [%esp+4]
        add     %esp, 8
        lea     %edx, [%ecx+%edx]
        ret

llvm-svn: 12692

1f6024cb

Improve code generation of long shifts by 32. · 2448baea

Chris Lattner authored Apr 06, 2004

On this testcase:

long %test(long %X) {
        %Y = shr long %X, ubyte 32
        ret long %Y
}

instead of:
t:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %EAX, DWORD PTR [%ESP + 8]
        sar %EAX, 0
        mov %EDX, 0
        ret


we now emit:
test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %EAX, DWORD PTR [%ESP + 8]
        mov %EDX, 0
        ret

llvm-svn: 12688

2448baea

Bugfixes: inc/dec don't set the carry flag! · 7332d4c5
Chris Lattner authored Apr 06, 2004
```
llvm-svn: 12687
```
7332d4c5

Improve code for passing constant longs as arguments to function calls. · decce5bc

Chris Lattner authored Apr 06, 2004

For example, on this instruction:

        call void %test(long 1234)

Instead of this:
        mov %EAX, 1234
        mov %ECX, 0
        mov DWORD PTR [%ESP], %EAX
        mov DWORD PTR [%ESP + 4], %ECX
        call test

We now emit this:
        mov DWORD PTR [%ESP], 1234
        mov DWORD PTR [%ESP + 4], 0
        call test

llvm-svn: 12686

decce5bc

Emit more efficient 64-bit operations when the RHS is a constant, and one · 5fc6f77b

Chris Lattner authored Apr 06, 2004

of the words of the constant is zeros.  For example:
  Y = and long X, 1234

now generates:
  Yl = and Xl, 1234
  Yh = 0

instead of:
  Yl = and Xl, 1234
  Yh = and Xh, 0

llvm-svn: 12685

5fc6f77b

Fix typeo · b49608af
Chris Lattner authored Apr 06, 2004
```
llvm-svn: 12684
```
b49608af
Add support for simple immediate handling to long instruction selection. · 996e667a
Chris Lattner authored Apr 06, 2004
```
This allows us to handle code like 'add long %X, 123456789012' more efficiently.

llvm-svn: 12683
```
996e667a

Implement negation of longs efficiently. For this testcase: · 37ba31f7

Chris Lattner authored Apr 06, 2004

long %test(long %X) {
        %Y = sub long 0, %X
        ret long %Y
}

We used to generate:

test:
        sub %ESP, 4
        mov DWORD PTR [%ESP], %ESI
        mov %ECX, DWORD PTR [%ESP + 8]
        mov %ESI, DWORD PTR [%ESP + 12]
        mov %EAX, 0
        mov %EDX, 0
        sub %EAX, %ECX
        sbb %EDX, %ESI
        mov %ESI, DWORD PTR [%ESP]
        add %ESP, 4
        ret

Now we generate:

test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %EDX, DWORD PTR [%ESP + 8]
        neg %EAX
        adc %EDX, 0
        neg %EDX
        ret

llvm-svn: 12681

37ba31f7

Minor tweak to avoid an extra reg-reg copy that the register allocator has to eliminate · bfe74f58
Chris Lattner authored Apr 06, 2004
```
llvm-svn: 12680
```
bfe74f58

Two changes: · 464e2ea5

Chris Lattner authored Apr 06, 2004

  * In promote32, if we can just promote a constant value, do so instead of
    promoting a constant dynamically.
  * In visitReturn inst, actually USE the promote32 argument that takes a
    Value*

The end result of this is that we now generate this:

test:
        mov %EAX, 0
        ret

instead of...

test:
        mov %AX, 0
        movzx %EAX, %AX
        ret

for:

ushort %test() {
        ret ushort 0
}

llvm-svn: 12679

464e2ea5

Apr 05, 2004
- Support getelementptr instructions which use uint's to index into structure · 69193f93
  Chris Lattner authored Apr 05, 2004
```
types and can have arbitrary 32- and 64-bit integer types indexing into
sequential types.

llvm-svn: 12653
```
  69193f93
Apr 02, 2004
- Clean up code a bit. · d64e904e
  Alkis Evlogimenos authored Apr 02, 2004
```
llvm-svn: 12615
```
  d64e904e
- Fix type in instruction builder instantiation · 5fc4772d
  Alkis Evlogimenos authored Apr 02, 2004
```
llvm-svn: 12610
```
  5fc4772d
Apr 01, 2004
- Generate slightly smaller code, "test R, R" instead of "cmp R, 0" · d55509c2
  Chris Lattner authored Mar 31, 2004
```
llvm-svn: 12579
```
  d55509c2
- Codegen FP select instructions into X86 conditional moves. Annoyingly enough · 37a7f09d
  Chris Lattner authored Mar 31, 2004
```
the X86 does not support a full set of fp cmove instructions, so we can't always
fold the condition into the select.  :(  Yuck.

llvm-svn: 12577
```
  37a7f09d
Mar 31, 2004
- Fold comparisons into select instructions, making much better code and · 32817f59
  Chris Lattner authored Mar 30, 2004
```
using our broad selection of movcc instructions.  :)

llvm-svn: 12560
```
  32817f59
Mar 30, 2004

Add direct support for integer select instructions, though we still don't support · 53b58cb8
Chris Lattner authored Mar 30, 2004
```
folding compares into the select yet.

llvm-svn: 12553
```
53b58cb8

Fix a fairly major performance problem. If a PHI node had a constant as · 0048e574

Chris Lattner authored Mar 30, 2004

an incoming value from a block, the selector would evaluate the constant
at the TOP of the block instead of at the end of the block.  This made the
live range for the constant span the entire block, increasing register
pressure needlessly.

llvm-svn: 12542

0048e574

Mar 18, 2004
- Malloc doesn't kill a load. This patch need not go into 1.2 though. · 6ca9b89a
  Chris Lattner authored Mar 18, 2004
```
llvm-svn: 12500
```
  6ca9b89a
- Fix a really nasty bug that was breaking ijpeg in LLC mode. We were incorrectly · dc47e271
  Chris Lattner authored Mar 18, 2004
```
folding load instructions into other instructions across free instruction
boundaries.  Perhaps this will also fix the other strange failures?

llvm-svn: 12494
```
  dc47e271
Mar 13, 2004
- It helps if I save the file. :) · 699aa70f
  Chris Lattner authored Mar 13, 2004
```
llvm-svn: 12357
```
  699aa70f
- Rename the intrinsic enum values for llvm.va_* from Intrinsic::va_* to · 071a5e56
  Chris Lattner authored Mar 13, 2004
```
Intrinsic::va*.  This avoid conflicting with macros in the stdlib.h file.

llvm-svn: 12356
```
  071a5e56
Mar 08, 2004

Implement folding explicit load instructions into binary operations. For a · 653e662a

Chris Lattner authored Mar 08, 2004

testcase like this:

int %test(int* %P, int %A) {
        %Pv = load int* %P
        %B = add int %A, %Pv
        ret int %B
}

We now generate:
test:
        mov %ECX, DWORD PTR [%ESP + 4]
        mov %EAX, DWORD PTR [%ESP + 8]
        add %EAX, DWORD PTR [%ECX]
        ret

Instead of:
test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        mov %EAX, DWORD PTR [%EAX]
        add %EAX, %ECX
        ret

... saving one instruction, and often a register.  Note that there are a lot
of other instructions that could use this, but they aren't handled.  I'm not
really interested in adding them, but mul/div and all of the FP instructions
could be supported as well if someone wanted to add them.

llvm-svn: 12204

653e662a

Rearrange and refactor some code. No functionality changes. · 1dd6afe6
Chris Lattner authored Mar 08, 2004
```
llvm-svn: 12203
```
1dd6afe6

Mar 02, 2004
- Doxygenify some comments. · a6025e64
  Misha Brukman authored Mar 01, 2004
```
llvm-svn: 12064
```
  a6025e64
Mar 01, 2004

Handle passing constant integers to functions much more efficiently. Instead · 1f4642c4

Chris Lattner authored Mar 01, 2004

of generating this code:

        mov %EAX, 4
        mov DWORD PTR [%ESP], %EAX
        mov %AX, 123
        movsx %EAX, %AX
        mov DWORD PTR [%ESP + 4], %EAX
        call Y

we now generate:
        mov DWORD PTR [%ESP], 4
        mov DWORD PTR [%ESP + 4], 123
        call Y

Which hurts the eyes less.  :)

Considering that register pressure around call sites is already high (with all
of the callee clobber registers n stuff), this may help a lot.

llvm-svn: 12028

1f4642c4

Fix a minor code-quality issue. When passing 8 and 16-bit integer constants · 5c7d3cda

Chris Lattner authored Mar 01, 2004

to function calls, we would emit dead code, like this:

int Y(int, short, double);
int X() {
  Y(4, 123, 4);
}

--- Old
X:
        sub %ESP, 20
        mov %EAX, 4
        mov DWORD PTR [%ESP], %EAX
***     mov %AX, 123
        mov %AX, 123
        movsx %EAX, %AX
        mov DWORD PTR [%ESP + 4], %EAX
        fld QWORD PTR [.CPIX_0]
        fstp QWORD PTR [%ESP + 8]
        call Y
        mov %EAX, 0
        # IMPLICIT_USE %EAX %ESP
        add %ESP, 20
        ret

Now we emit:
X:
        sub %ESP, 20
        mov %EAX, 4
        mov DWORD PTR [%ESP], %EAX
        mov %AX, 123
        movsx %EAX, %AX
        mov DWORD PTR [%ESP + 4], %EAX
        fld QWORD PTR [.CPIX_0]
        fstp QWORD PTR [%ESP + 8]
        call Y
        mov %EAX, 0
        # IMPLICIT_USE %EAX %ESP
        add %ESP, 20
        ret

Next up, eliminate the mov AX and movsx entirely!

llvm-svn: 12026

5c7d3cda