Skip to content
  1. Mar 09, 2004
  2. Mar 08, 2004
    • Chris Lattner's avatar
      Implement folding explicit load instructions into binary operations. For a · 653e662a
      Chris Lattner authored
      testcase like this:
      
      int %test(int* %P, int %A) {
              %Pv = load int* %P
              %B = add int %A, %Pv
              ret int %B
      }
      
      We now generate:
      test:
              mov %ECX, DWORD PTR [%ESP + 4]
              mov %EAX, DWORD PTR [%ESP + 8]
              add %EAX, DWORD PTR [%ECX]
              ret
      
      Instead of:
      test:
              mov %EAX, DWORD PTR [%ESP + 4]
              mov %ECX, DWORD PTR [%ESP + 8]
              mov %EAX, DWORD PTR [%EAX]
              add %EAX, %ECX
              ret
      
      ... saving one instruction, and often a register.  Note that there are a lot
      of other instructions that could use this, but they aren't handled.  I'm not
      really interested in adding them, but mul/div and all of the FP instructions
      could be supported as well if someone wanted to add them.
      
      llvm-svn: 12204
      653e662a
    • Chris Lattner's avatar
      Rearrange and refactor some code. No functionality changes. · 1dd6afe6
      Chris Lattner authored
      llvm-svn: 12203
      1dd6afe6
  3. Mar 07, 2004
  4. Mar 04, 2004
  5. Mar 02, 2004
  6. Mar 01, 2004
    • Brian Gaeke's avatar
      TargetCacheInfo has been removed; its only uses were to propagate a constant · 427cec13
      Brian Gaeke authored
      (16) into certain areas of the SPARC V9 back-end. I'm fairly sure the US IIIi's
      dcache has 32-byte lines, so I'm not sure where the 16 came from. However, in
      the interest of not breaking things any more than they already are, I'm going
      to leave the constant alone.
      
      llvm-svn: 12043
      427cec13
    • Chris Lattner's avatar
      Handle passing constant integers to functions much more efficiently. Instead · 1f4642c4
      Chris Lattner authored
      of generating this code:
      
              mov %EAX, 4
              mov DWORD PTR [%ESP], %EAX
              mov %AX, 123
              movsx %EAX, %AX
              mov DWORD PTR [%ESP + 4], %EAX
              call Y
      
      we now generate:
              mov DWORD PTR [%ESP], 4
              mov DWORD PTR [%ESP + 4], 123
              call Y
      
      Which hurts the eyes less.  :)
      
      Considering that register pressure around call sites is already high (with all
      of the callee clobber registers n stuff), this may help a lot.
      
      llvm-svn: 12028
      1f4642c4
    • Chris Lattner's avatar
      Fix a minor code-quality issue. When passing 8 and 16-bit integer constants · 5c7d3cda
      Chris Lattner authored
      to function calls, we would emit dead code, like this:
      
      int Y(int, short, double);
      int X() {
        Y(4, 123, 4);
      }
      
      --- Old
      X:
              sub %ESP, 20
              mov %EAX, 4
              mov DWORD PTR [%ESP], %EAX
      ***     mov %AX, 123
              mov %AX, 123
              movsx %EAX, %AX
              mov DWORD PTR [%ESP + 4], %EAX
              fld QWORD PTR [.CPIX_0]
              fstp QWORD PTR [%ESP + 8]
              call Y
              mov %EAX, 0
              # IMPLICIT_USE %EAX %ESP
              add %ESP, 20
              ret
      
      Now we emit:
      X:
              sub %ESP, 20
              mov %EAX, 4
              mov DWORD PTR [%ESP], %EAX
              mov %AX, 123
              movsx %EAX, %AX
              mov DWORD PTR [%ESP + 4], %EAX
              fld QWORD PTR [.CPIX_0]
              fstp QWORD PTR [%ESP + 8]
              call Y
              mov %EAX, 0
              # IMPLICIT_USE %EAX %ESP
              add %ESP, 20
              ret
      
      Next up, eliminate the mov AX and movsx entirely!
      
      llvm-svn: 12026
      5c7d3cda
  7. Feb 29, 2004
  8. Feb 28, 2004
  9. Feb 27, 2004
  10. Feb 26, 2004
  11. Feb 25, 2004
    • Chris Lattner's avatar
      Fix failures in 099.go due to the cfgsimplify pass creating switch instructions · 64c9b223
      Chris Lattner authored
      where there did not used to be any before
      
      llvm-svn: 11829
      64c9b223
    • Chris Lattner's avatar
      Teach the instruction selector how to transform 'array' GEP computations into X86 · 309327a4
      Chris Lattner authored
      scaled indexes.  This allows us to compile GEP's like this:
      
      int* %test([10 x { int, { int } }]* %X, int %Idx) {
              %Idx = cast int %Idx to long
              %X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0
              ret int* %X
      }
      
      Into a single address computation:
      
      test:
              mov %EAX, DWORD PTR [%ESP + 4]
              mov %ECX, DWORD PTR [%ESP + 8]
              lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4]
              ret
      
      Before it generated:
      test:
              mov %EAX, DWORD PTR [%ESP + 4]
              mov %ECX, DWORD PTR [%ESP + 8]
              shl %ECX, 3
              add %EAX, %ECX
              lea %EAX, DWORD PTR [%EAX + 4]
              ret
      
      This is useful for things like int/float/double arrays, as the indexing can be folded into
      the loads&stores, reducing register pressure and decreasing the pressure on the decode unit.
      With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot.  On
      bzip2 for example, we go from this:
      
      10665 asm-printer           - Number of machine instrs printed
         40 ra-local              - Number of loads/stores folded into instructions
       1708 ra-local              - Number of loads added
       1532 ra-local              - Number of stores added
       1354 twoaddressinstruction - Number of instructions added
       1354 twoaddressinstruction - Number of two-address instructions
       2794 x86-peephole          - Number of peephole optimization performed
      
      to this:
      9873 asm-printer           - Number of machine instrs printed
        41 ra-local              - Number of loads/stores folded into instructions
      1710 ra-local              - Number of loads added
      1521 ra-local              - Number of stores added
       789 twoaddressinstruction - Number of instructions added
       789 twoaddressinstruction - Number of two-address instructions
      2142 x86-peephole          - Number of peephole optimization performed
      
      ... and these types of instructions are often in tight loops.
      
      Linear scan is also helped, but not as much.  It goes from:
      
      8787 asm-printer           - Number of machine instrs printed
      2389 liveintervals         - Number of identity moves eliminated after coalescing
      2288 liveintervals         - Number of interval joins performed
      3522 liveintervals         - Number of intervals after coalescing
      5810 liveintervals         - Number of original intervals
       700 spiller               - Number of loads added
       487 spiller               - Number of stores added
       303 spiller               - Number of register spills
      1354 twoaddressinstruction - Number of instructions added
      1354 twoaddressinstruction - Number of two-address instructions
       363 x86-peephole          - Number of peephole optimization performed
      
      to:
      
      7982 asm-printer           - Number of machine instrs printed
      1759 liveintervals         - Number of identity moves eliminated after coalescing
      1658 liveintervals         - Number of interval joins performed
      3282 liveintervals         - Number of intervals after coalescing
      4940 liveintervals         - Number of original intervals
       635 spiller               - Number of loads added
       452 spiller               - Number of stores added
       288 spiller               - Number of register spills
       789 twoaddressinstruction - Number of instructions added
       789 twoaddressinstruction - Number of two-address instructions
       258 x86-peephole          - Number of peephole optimization performed
      
      Though I'm not complaining about the drop in the number of intervals.  :)
      
      llvm-svn: 11820
      309327a4
Loading