Skip to content
  1. Nov 05, 2004
  2. Nov 02, 2004
  3. Nov 01, 2004
  4. Oct 28, 2004
  5. Oct 22, 2004
  6. Oct 19, 2004
  7. Oct 18, 2004
  8. Oct 17, 2004
    • Chris Lattner's avatar
      Don't print stuff out from the code generator. This broke the JIT horribly · 06855531
      Chris Lattner authored
      last night. :)  bork!
      
      llvm-svn: 17093
      06855531
    • Chris Lattner's avatar
      Rewrite support for cast uint -> FP. In particular, we used to compile this: · 839abf57
      Chris Lattner authored
      double %test(uint %X) {
              %tmp.1 = cast uint %X to double         ; <double> [#uses=1]
              ret double %tmp.1
      }
      
      into:
      
      test:
              sub %ESP, 8
              mov %EAX, DWORD PTR [%ESP + 12]
              mov %ECX, 0
              mov DWORD PTR [%ESP], %EAX
              mov DWORD PTR [%ESP + 4], %ECX
              fild QWORD PTR [%ESP]
              add %ESP, 8
              ret
      
      ... which basically zero extends to 8 bytes, then does an fild for an
      8-byte signed int.
      
      Now we generate this:
      
      
      test:
              sub %ESP, 4
              mov %EAX, DWORD PTR [%ESP + 8]
              mov DWORD PTR [%ESP], %EAX
              fild DWORD PTR [%ESP]
              shr %EAX, 31
              fadd DWORD PTR [.CPItest_0 + 4*%EAX]
              add %ESP, 4
              ret
      
              .section .rodata
              .align  4
      .CPItest_0:
              .quad   5728578726015270912
      
      This does a 32-bit signed integer load, then adds in an offset if the sign
      bit of the integer was set.
      
      It turns out that this is substantially faster than the preceeding sequence.
      Consider this testcase:
      
      unsigned a[2]={1,2};
      volatile double G;
      
      void main() {
          int i;
          for (i=0; i<100000000; ++i )
              G += a[i&1];
      }
      
      On zion (a P4 Xeon, 3Ghz), this patch speeds up the testcase from 2.140s
      to 0.94s.
      
      On apoc, an athlon MP 2100+, this patch speeds up the testcase from 1.72s
      to 1.34s.
      
      Note that the program takes 2.5s/1.97s on zion/apoc with GCC 3.3 -O3
      -fomit-frame-pointer.
      
      llvm-svn: 17083
      839abf57
    • Chris Lattner's avatar
      Unify handling of constant pool indexes with the other code paths, allowing · 112fd88a
      Chris Lattner authored
      us to use index registers for CPI's
      
      llvm-svn: 17082
      112fd88a
    • Chris Lattner's avatar
      Give the asmprinter the ability to print memrefs with a constant pool index, · af19d396
      Chris Lattner authored
      index reg and scale
      
      llvm-svn: 17081
      af19d396
    • Chris Lattner's avatar
      fold: · 653d8663
      Chris Lattner authored
        %X = and Y, constantint
        %Z = setcc %X, 0
      
      instead of emitting:
      
              and %EAX, 3
              test %EAX, %EAX
              je .LBBfoo2_2   # UnifiedReturnBlock
      
      We now emit:
      
              test %EAX, 3
              je .LBBfoo2_2   # UnifiedReturnBlock
      
      This triggers 581 times on 176.gcc for example.
      
      llvm-svn: 17080
      653d8663
  9. Oct 16, 2004
  10. Oct 15, 2004
  11. Oct 13, 2004
  12. Oct 11, 2004
  13. Oct 09, 2004
  14. Oct 08, 2004
  15. Oct 06, 2004
    • Chris Lattner's avatar
      Remove debugging code, fix encoding problem. This fixes the problems · 93867e51
      Chris Lattner authored
      the JIT had last night.
      
      llvm-svn: 16766
      93867e51
    • Chris Lattner's avatar
      Codegen signed mod by 2 or -2 more efficiently. Instead of generating: · 6835dedb
      Chris Lattner authored
      t:
              mov %EDX, DWORD PTR [%ESP + 4]
              mov %ECX, 2
              mov %EAX, %EDX
              sar %EDX, 31
              idiv %ECX
              mov %EAX, %EDX
              ret
      
      Generate:
      t:
              mov %ECX, DWORD PTR [%ESP + 4]
      ***     mov %EAX, %ECX
              cdq
              and %ECX, 1
              xor %ECX, %EDX
              sub %ECX, %EDX
      ***     mov %EAX, %ECX
              ret
      
      Note that the two marked moves are redundant, and should be eliminated by the
      register allocator, but aren't.
      
      Compare this to GCC, which generates:
      
      t:
              mov     %eax, DWORD PTR [%esp+4]
              mov     %edx, %eax
              shr     %edx, 31
              lea     %ecx, [%edx+%eax]
              and     %ecx, -2
              sub     %eax, %ecx
              ret
      
      or ICC 8.0, which generates:
      
      t:
              movl      4(%esp), %ecx                                 #3.5
              movl      $-2147483647, %eax                            #3.25
              imull     %ecx                                          #3.25
              movl      %ecx, %eax                                    #3.25
              sarl      $31, %eax                                     #3.25
              addl      %ecx, %edx                                    #3.25
              subl      %edx, %eax                                    #3.25
              addl      %eax, %eax                                    #3.25
              negl      %eax                                          #3.25
              subl      %eax, %ecx                                    #3.25
              movl      %ecx, %eax                                    #3.25
              ret                                                     #3.25
      
      We would be in great shape if not for the moves.
      
      llvm-svn: 16763
      6835dedb
    • Chris Lattner's avatar
      Fix a scary bug with signed division by a power of two. We used to generate: · 7bd8f133
      Chris Lattner authored
      s:   ;; X / 4
              mov %EAX, DWORD PTR [%ESP + 4]
              mov %ECX, %EAX
              sar %ECX, 1
              shr %ECX, 30
              mov %EDX, %EAX
              add %EDX, %ECX
              sar %EAX, 2
              ret
      
      When we really meant:
      
      s:
              mov %EAX, DWORD PTR [%ESP + 4]
              mov %ECX, %EAX
              sar %ECX, 1
              shr %ECX, 30
              add %EAX, %ECX
              sar %EAX, 2
              ret
      
      Hey, this also reduces register pressure too :)
      
      llvm-svn: 16761
      7bd8f133
    • Chris Lattner's avatar
      Codegen signed divides by 2 and -2 more efficiently. In particular · 147edd2f
      Chris Lattner authored
      instead of:
      
      s:   ;; X / 2
              movl 4(%esp), %eax
              movl %eax, %ecx
              shrl $31, %ecx
              movl %eax, %edx
              addl %ecx, %edx
              sarl $1, %eax
              ret
      
      t:   ;; X / -2
              movl 4(%esp), %eax
              movl %eax, %ecx
              shrl $31, %ecx
              movl %eax, %edx
              addl %ecx, %edx
              sarl $1, %eax
              negl %eax
              ret
      
      Emit:
      
      s:
              movl 4(%esp), %eax
              cmpl $-2147483648, %eax
              sbbl $-1, %eax
              sarl $1, %eax
              ret
      
      t:
              movl 4(%esp), %eax
              cmpl $-2147483648, %eax
              sbbl $-1, %eax
              sarl $1, %eax
              negl %eax
              ret
      
      llvm-svn: 16760
      147edd2f
    • Chris Lattner's avatar
      Add some new instructions. Fix the asm string for sbb32rr · e9bfa5a2
      Chris Lattner authored
      llvm-svn: 16759
      e9bfa5a2
  16. Oct 04, 2004
  17. Oct 03, 2004
  18. Sep 21, 2004
Loading