Commits · 0fe57da8fa29e40ef56d5cc44ab24d32e3f084b0 · Roger Ferrer / llvm-epi-0.8

Apr 12, 2004

Fix a bug in my load/cast folding patch. · 0fe57da8
Chris Lattner authored Apr 12, 2004
```
llvm-svn: 12849
```
0fe57da8
Adjust some comments, fix a bug in my previous patch · dc010546
Chris Lattner authored Apr 12, 2004
```
llvm-svn: 12848
```
dc010546

On X86, casting an integer to floating point requires going through memory. · 07c1c115

Chris Lattner authored Apr 11, 2004

If the source of the cast is a load, we can just use the source memory location,
without having to create a temporary stack slot entry.

Before we code generated this:

double %int(int* %P) {
        %V = load int* %P
        %V2 = cast int %V to double
        ret double %V2
}

into:

int:
        sub %ESP, 4
        mov %EAX, DWORD PTR [%ESP + 8]
        mov %EAX, DWORD PTR [%EAX]
        mov DWORD PTR [%ESP], %EAX
        fild DWORD PTR [%ESP]
        add %ESP, 4
        ret

Now we produce this:

int:
        mov %EAX, DWORD PTR [%ESP + 4]
        fild DWORD PTR [%EAX]
        ret

... which is nicer.

llvm-svn: 12846

07c1c115

Implement folding of loads into floating point operations. This implements: · d4af820a
Chris Lattner authored Apr 11, 2004
```
test/Regression/CodeGen/X86/fp_load_fold.llx

llvm-svn: 12844
```
d4af820a

Apr 11, 2004

Unify all of the code for floating point +,-,*,/ into one function · dcb750f0
Chris Lattner authored Apr 11, 2004
```
llvm-svn: 12842
```
dcb750f0

This implements folding of constant operands into floating point operations · 80ba4016

Chris Lattner authored Apr 11, 2004

for mul and div.

Instead of generating this:

test_divr:
        fld QWORD PTR [%ESP + 4]
        fld QWORD PTR [.CPItest_divr_0]
        fdivrp %ST(1)
        ret

We now generate this:

test_divr:
        fld QWORD PTR [%ESP + 4]
        fdivr QWORD PTR [.CPItest_divr_0]
        ret

This code desperately needs refactoring, which will come in the next
patch.

llvm-svn: 12841

80ba4016

Restructure the mul/div/rem handling code to follow the pattern the other · e1efbc7c

Chris Lattner authored Apr 11, 2004

instructions use.  This doesn't change any functionality except that long
constant expressions of these operations will now magically start working.

llvm-svn: 12840

e1efbc7c

Codegen FP adds and subtracts with a constant more efficiently, generating: · f7ed7df5

Chris Lattner authored Apr 11, 2004

        fld QWORD PTR [%ESP + 4]
        fadd QWORD PTR [.CPItest_add_0]

instead of:

        fld QWORD PTR [%ESP + 4]
        fld QWORD PTR [.CPItest_add_0]
        faddp %ST(1)

I also intend to do this for mul & div, but it appears that I have to
refactor a bit of code before I can do so.

This is tested by: test/Regression/CodeGen/X86/fp_constant_op.llx

llvm-svn: 12839

f7ed7df5

Add some new instructions · b6e0b58f
Chris Lattner authored Apr 11, 2004
```
llvm-svn: 12838
```
b6e0b58f
Relax assertion to make this function work with a broader class of instructions · 5b444722
Chris Lattner authored Apr 11, 2004
```
llvm-svn: 12836
```
5b444722

Two changes: · 3f912a6f

Chris Lattner authored Apr 11, 2004

  1. If an incoming argument is dead, don't load it from the stack
  2. Do not code gen noop copies at all (ie, cast int -> uint), not even to
     a move.  This should reduce register pressure for allocators that are
     unable to coallesce away these copies in some cases.

llvm-svn: 12835

3f912a6f

Apr 10, 2004
- Silence a spurious warning · d450df05
  Chris Lattner authored Apr 10, 2004
```
llvm-svn: 12815
```
  d450df05
Apr 09, 2004

Reversed the order of the llvm.writeport() operands so that the value · 2b4c96e7
John Criswell authored Apr 09, 2004
```
is listed first and the address is listed second.

llvm-svn: 12795
```
2b4c96e7
Changed assertions to error messages. · 2fc99838
John Criswell authored Apr 09, 2004
```
llvm-svn: 12787
```
2fc99838

Changes recommended by Chris: · c28c3b62

John Criswell authored Apr 08, 2004

InstSelectSimple.cpp:
  Change the checks for proper I/O port address size into an exit() instead
  of an assertion.  Assertions aren't used in Release builds, and handling
  this error should be graceful (not that this counts as graceful, but it's
  more graceful).

  Modified the generation of the IN/OUT instructions to have 0 arguments.
X86InstrInfo.td:
  Added the OpSize attribute to the 16 bit IN and OUT instructions.

llvm-svn: 12786

c28c3b62

Apr 08, 2004

Added the llvm.readport and llvm.writeport intrinsics for x86. These do · 10db062d

John Criswell authored Apr 08, 2004

I/O port instructions on x86.  The specific code sequence is tailored to
the parameters and return value of the intrinsic call.
Added the ability for implicit defintions to be printed in the Instruction
Printer.
Added the ability for RawFrm instruction to print implict uses and
defintions with correct comma output.  This required adjustment to some
methods so that a leading comma would or would not be printed.

llvm-svn: 12782

10db062d

Apr 06, 2004

· b8955205

Jakub Staszak authored Apr 06, 2004

file based off InstSelectSimple.cpp, slowly being replaced by generated code from the really simple X86 instruction selector tablegen backend

llvm-svn: 12715

b8955205

· de647007

Jakub Staszak authored Apr 06, 2004

Tablgen files for really simple instruction selector

llvm-svn: 12714

de647007

Fix PR313: [x86] JIT miscompiles unsigned short to floating point · 4b936125
Chris Lattner authored Apr 06, 2004
```
llvm-svn: 12711
```
4b936125
Fix incorrect encoding of some ADC and SBB instuctions · ba33ae58
Chris Lattner authored Apr 06, 2004
```
llvm-svn: 12710
```
ba33ae58

Fix a minor bug in previous checking · 19c8b13e

Chris Lattner authored Apr 06, 2004

Enable folding of long seteq/setne comparisons into branches and select instructions
Implement unfolded long relational comparisons against a constants a bit more efficiently

Folding comparisons changes code that looks like this:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %EDX, DWORD PTR [%ESP + 8]
        mov %ECX, %EAX
        or %ECX, %EDX
        sete %CL
        test %CL, %CL
        je .LBB2 # PC rel: F

into code that looks like this:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %EDX, DWORD PTR [%ESP + 8]
        mov %ECX, %EAX
        or %ECX, %EDX
        jne .LBB2 # PC rel: F

This speeds up 186.crafty by 6% with llc-ls.

llvm-svn: 12702

19c8b13e

Improve codegen of long == and != comparisons against constants. Before, · f2ee88eb

Chris Lattner authored Apr 06, 2004

comparing a long against zero got us this:

        sub %ESP, 8
        mov DWORD PTR [%ESP + 4], %ESI
        mov DWORD PTR [%ESP], %EDI
        mov %EAX, DWORD PTR [%ESP + 12]
        mov %EDX, DWORD PTR [%ESP + 16]
        mov %ECX, 0
        mov %ESI, 0
        mov %EDI, %EAX
        xor %EDI, %ECX
        mov %ECX, %EDX
        xor %ECX, %ESI
        or %EDI, %ECX
        sete %CL
        test %CL, %CL
        je .LBB2 # PC rel: F

Now it gets us this:

        mov %EAX, DWORD PTR [%ESP + 4]
        mov %EDX, DWORD PTR [%ESP + 8]
        mov %ECX, %EAX
        or %ECX, %EDX
        sete %CL
        test %CL, %CL
        je .LBB2 # PC rel: F

llvm-svn: 12696

f2ee88eb

Handle various other important cases of multiplying a long constant immediate. For · 6c3bf13f

Chris Lattner authored Apr 06, 2004

example, multiplying X*(1 + (1LL << 32)) now produces:

test:
        mov %ECX, DWORD PTR [%ESP + 4]
        mov %EDX, DWORD PTR [%ESP + 8]
        mov %EAX, %ECX
        add %EDX, %ECX
        ret

[[[Note to Alkis: why isn't linear scan generating this code??  This might be a
 problem with your intervals being too conservative:

test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %EDX, DWORD PTR [%ESP + 8]
        add %EDX, %EAX
        ret

end note]]]

Whereas GCC produces this:

T:
        sub     %esp, 12
        mov     %edx, DWORD PTR [%esp+16]
        mov     DWORD PTR [%esp+8], %edi
        mov     %ecx, DWORD PTR [%esp+20]
        xor     %edi, %edi
        mov     DWORD PTR [%esp], %ebx
        mov     %ebx, %edi
        mov     %eax, %edx
        mov     DWORD PTR [%esp+4], %esi
        add     %ebx, %edx
        mov     %edi, DWORD PTR [%esp+8]
        lea     %edx, [%ecx+%ebx]
        mov     %esi, DWORD PTR [%esp+4]
        mov     %ebx, DWORD PTR [%esp]
        add     %esp, 12
        ret

I'm not sure example what GCC is smoking here, but it looks like it has just
confused itself with a bunch of stack slots or something.  The intel compiler
is better, but still not good:

T:
        movl      4(%esp), %edx                                 #2.11
        movl      8(%esp), %eax                                 #2.11
        lea       (%eax,%edx), %ecx                             #3.12
        movl      $1, %eax                                      #3.12
        mull      %edx                                          #3.12
        addl      %ecx, %edx                                    #3.12
        ret                                                     #3.12

llvm-svn: 12693

6c3bf13f

Efficiently handle a long multiplication by a constant. For this testcase: · 1f6024cb

Chris Lattner authored Apr 06, 2004

long %test(long %X) {
        %Y = mul long %X, 123
        ret long %Y
}

we used to generate:

test:
        sub %ESP, 12
        mov DWORD PTR [%ESP + 8], %ESI
        mov DWORD PTR [%ESP + 4], %EDI
        mov DWORD PTR [%ESP], %EBX
        mov %ECX, DWORD PTR [%ESP + 16]
        mov %ESI, DWORD PTR [%ESP + 20]
        mov %EDI, 123
        mov %EBX, 0
        mov %EAX, %ECX
        mul %EDI
        imul %ESI, %EDI
        add %ESI, %EDX
        imul %ECX, %EBX
        add %ESI, %ECX
        mov %EDX, %ESI
        mov %EBX, DWORD PTR [%ESP]
        mov %EDI, DWORD PTR [%ESP + 4]
        mov %ESI, DWORD PTR [%ESP + 8]
        add %ESP, 12
        ret

Now we emit:
test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        mov %EDX, 123
        mul %EDX
        imul %ECX, %ECX, 123
        add %ECX, %EDX
        mov %EDX, %ECX
        ret

Which, incidently, is substantially nicer than what GCC manages:
T:
        sub     %esp, 8
        mov     %eax, 123
        mov     DWORD PTR [%esp], %ebx
        mov     %ebx, DWORD PTR [%esp+16]
        mov     DWORD PTR [%esp+4], %esi
        mov     %esi, DWORD PTR [%esp+12]
        imul    %ecx, %ebx, 123
        mov     %ebx, DWORD PTR [%esp]
        mul     %esi
        mov     %esi, DWORD PTR [%esp+4]
        add     %esp, 8
        lea     %edx, [%ecx+%edx]
        ret

llvm-svn: 12692

1f6024cb

Improve code generation of long shifts by 32. · 2448baea

Chris Lattner authored Apr 06, 2004

On this testcase:

long %test(long %X) {
        %Y = shr long %X, ubyte 32
        ret long %Y
}

instead of:
t:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %EAX, DWORD PTR [%ESP + 8]
        sar %EAX, 0
        mov %EDX, 0
        ret


we now emit:
test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %EAX, DWORD PTR [%ESP + 8]
        mov %EDX, 0
        ret

llvm-svn: 12688

2448baea

Bugfixes: inc/dec don't set the carry flag! · 7332d4c5
Chris Lattner authored Apr 06, 2004
```
llvm-svn: 12687
```
7332d4c5

Improve code for passing constant longs as arguments to function calls. · decce5bc

Chris Lattner authored Apr 06, 2004

For example, on this instruction:

        call void %test(long 1234)

Instead of this:
        mov %EAX, 1234
        mov %ECX, 0
        mov DWORD PTR [%ESP], %EAX
        mov DWORD PTR [%ESP + 4], %ECX
        call test

We now emit this:
        mov DWORD PTR [%ESP], 1234
        mov DWORD PTR [%ESP + 4], 0
        call test

llvm-svn: 12686

decce5bc

Emit more efficient 64-bit operations when the RHS is a constant, and one · 5fc6f77b

Chris Lattner authored Apr 06, 2004

of the words of the constant is zeros.  For example:
  Y = and long X, 1234

now generates:
  Yl = and Xl, 1234
  Yh = 0

instead of:
  Yl = and Xl, 1234
  Yh = and Xh, 0

llvm-svn: 12685

5fc6f77b

Fix typeo · b49608af
Chris Lattner authored Apr 06, 2004
```
llvm-svn: 12684
```
b49608af
Add support for simple immediate handling to long instruction selection. · 996e667a
Chris Lattner authored Apr 06, 2004
```
This allows us to handle code like 'add long %X, 123456789012' more efficiently.

llvm-svn: 12683
```
996e667a
The sbb instructions really ARE sbb's, not adc's · 9366f034
Chris Lattner authored Apr 06, 2004
```
llvm-svn: 12682
```
9366f034

Implement negation of longs efficiently. For this testcase: · 37ba31f7

Chris Lattner authored Apr 06, 2004

long %test(long %X) {
        %Y = sub long 0, %X
        ret long %Y
}

We used to generate:

test:
        sub %ESP, 4
        mov DWORD PTR [%ESP], %ESI
        mov %ECX, DWORD PTR [%ESP + 8]
        mov %ESI, DWORD PTR [%ESP + 12]
        mov %EAX, 0
        mov %EDX, 0
        sub %EAX, %ECX
        sbb %EDX, %ESI
        mov %ESI, DWORD PTR [%ESP]
        add %ESP, 4
        ret

Now we generate:

test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %EDX, DWORD PTR [%ESP + 8]
        neg %EAX
        adc %EDX, 0
        neg %EDX
        ret

llvm-svn: 12681

37ba31f7

Minor tweak to avoid an extra reg-reg copy that the register allocator has to eliminate · bfe74f58
Chris Lattner authored Apr 06, 2004
```
llvm-svn: 12680
```
bfe74f58

Two changes: · 464e2ea5

Chris Lattner authored Apr 06, 2004

  * In promote32, if we can just promote a constant value, do so instead of
    promoting a constant dynamically.
  * In visitReturn inst, actually USE the promote32 argument that takes a
    Value*

The end result of this is that we now generate this:

test:
        mov %EAX, 0
        ret

instead of...

test:
        mov %AX, 0
        movzx %EAX, %AX
        ret

for:

ushort %test() {
        ret ushort 0
}

llvm-svn: 12679

464e2ea5

Apr 05, 2004
- Support getelementptr instructions which use uint's to index into structure · 69193f93
  Chris Lattner authored Apr 05, 2004
```
types and can have arbitrary 32- and 64-bit integer types indexing into
sequential types.

llvm-svn: 12653
```
  69193f93
Apr 02, 2004
- Clean up code a bit. · d64e904e
  Alkis Evlogimenos authored Apr 02, 2004
```
llvm-svn: 12615
```
  d64e904e
- Fix type in comments · fe66caa9
  Alkis Evlogimenos authored Apr 02, 2004
```
llvm-svn: 12611
```
  fe66caa9
- Fix type in instruction builder instantiation · 5fc4772d
  Alkis Evlogimenos authored Apr 02, 2004
```
llvm-svn: 12610
```
  5fc4772d
- Add more ADC and SBB variants · d186ed02
  Alkis Evlogimenos authored Apr 02, 2004
```
llvm-svn: 12607
```
  d186ed02
Apr 01, 2004

Simplify code by using the more powerful BuildMI forms. · bc7e35b3

Chris Lattner authored Apr 01, 2004

Implement a small optimization.  In test/Regression/CodeGen/X86/select.ll,
we now generate this for foldSel3:

foldSel3:
        mov %AL, BYTE PTR [%ESP + 4]
        fld DWORD PTR [%ESP + 8]
        fld DWORD PTR [%ESP + 12]
        mov %EAX, DWORD PTR [%ESP + 16]
        mov %ECX, DWORD PTR [%ESP + 20]
        cmp %EAX, %ECX
        fxch %ST(1)
        fcmovae %ST(0), %ST(1)
***     fstp %ST(1)
        ret

Instead of:

foldSel3:
        mov %AL, BYTE PTR [%ESP + 4]
        fld DWORD PTR [%ESP + 8]
        fld DWORD PTR [%ESP + 12]
        mov %EAX, DWORD PTR [%ESP + 16]
        mov %ECX, DWORD PTR [%ESP + 20]
        cmp %EAX, %ECX
        fxch %ST(1)
        fcmovae %ST(0), %ST(1)
***     fxch %ST(1)
***     fstp %ST(0)
        ret

In practice, this only effects code size: performance should be basically
unaffected.

llvm-svn: 12588

bc7e35b3