Skip to content
  1. Feb 29, 2004
  2. Feb 28, 2004
  3. Feb 27, 2004
  4. Feb 26, 2004
  5. Feb 25, 2004
    • Misha Brukman's avatar
      SparcV8 regs are really 32-bit, not 64! Thanks, Chris. · 564654d6
      Misha Brukman authored
      llvm-svn: 11835
      564654d6
    • Misha Brukman's avatar
      Clean up the tablegen descriptions for SparcV8. · f8dcdcc8
      Misha Brukman authored
      llvm-svn: 11834
      f8dcdcc8
    • Misha Brukman's avatar
      2122b969
    • Misha Brukman's avatar
      0e3a7ca5
    • Chris Lattner's avatar
      Fix failures in 099.go due to the cfgsimplify pass creating switch instructions · 64c9b223
      Chris Lattner authored
      where there did not used to be any before
      
      llvm-svn: 11829
      64c9b223
    • Brian Gaeke's avatar
      SparcV8 skeleton · 9a5bd7fc
      Brian Gaeke authored
      llvm-svn: 11828
      9a5bd7fc
    • Brian Gaeke's avatar
    • Brian Gaeke's avatar
      Great renaming: Sparc --> SparcV9 · 94e95d2b
      Brian Gaeke authored
      llvm-svn: 11826
      94e95d2b
    • Chris Lattner's avatar
      Teach the instruction selector how to transform 'array' GEP computations into X86 · 309327a4
      Chris Lattner authored
      scaled indexes.  This allows us to compile GEP's like this:
      
      int* %test([10 x { int, { int } }]* %X, int %Idx) {
              %Idx = cast int %Idx to long
              %X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0
              ret int* %X
      }
      
      Into a single address computation:
      
      test:
              mov %EAX, DWORD PTR [%ESP + 4]
              mov %ECX, DWORD PTR [%ESP + 8]
              lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4]
              ret
      
      Before it generated:
      test:
              mov %EAX, DWORD PTR [%ESP + 4]
              mov %ECX, DWORD PTR [%ESP + 8]
              shl %ECX, 3
              add %EAX, %ECX
              lea %EAX, DWORD PTR [%EAX + 4]
              ret
      
      This is useful for things like int/float/double arrays, as the indexing can be folded into
      the loads&stores, reducing register pressure and decreasing the pressure on the decode unit.
      With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot.  On
      bzip2 for example, we go from this:
      
      10665 asm-printer           - Number of machine instrs printed
         40 ra-local              - Number of loads/stores folded into instructions
       1708 ra-local              - Number of loads added
       1532 ra-local              - Number of stores added
       1354 twoaddressinstruction - Number of instructions added
       1354 twoaddressinstruction - Number of two-address instructions
       2794 x86-peephole          - Number of peephole optimization performed
      
      to this:
      9873 asm-printer           - Number of machine instrs printed
        41 ra-local              - Number of loads/stores folded into instructions
      1710 ra-local              - Number of loads added
      1521 ra-local              - Number of stores added
       789 twoaddressinstruction - Number of instructions added
       789 twoaddressinstruction - Number of two-address instructions
      2142 x86-peephole          - Number of peephole optimization performed
      
      ... and these types of instructions are often in tight loops.
      
      Linear scan is also helped, but not as much.  It goes from:
      
      8787 asm-printer           - Number of machine instrs printed
      2389 liveintervals         - Number of identity moves eliminated after coalescing
      2288 liveintervals         - Number of interval joins performed
      3522 liveintervals         - Number of intervals after coalescing
      5810 liveintervals         - Number of original intervals
       700 spiller               - Number of loads added
       487 spiller               - Number of stores added
       303 spiller               - Number of register spills
      1354 twoaddressinstruction - Number of instructions added
      1354 twoaddressinstruction - Number of two-address instructions
       363 x86-peephole          - Number of peephole optimization performed
      
      to:
      
      7982 asm-printer           - Number of machine instrs printed
      1759 liveintervals         - Number of identity moves eliminated after coalescing
      1658 liveintervals         - Number of interval joins performed
      3282 liveintervals         - Number of intervals after coalescing
      4940 liveintervals         - Number of original intervals
       635 spiller               - Number of loads added
       452 spiller               - Number of stores added
       288 spiller               - Number of register spills
       789 twoaddressinstruction - Number of instructions added
       789 twoaddressinstruction - Number of two-address instructions
       258 x86-peephole          - Number of peephole optimization performed
      
      Though I'm not complaining about the drop in the number of intervals.  :)
      
      llvm-svn: 11820
      309327a4
    • Chris Lattner's avatar
      * Make the previous patch more efficient by not allocating a temporary MachineInstr · d1ee55d4
      Chris Lattner authored
        to do analysis.
      
      *** FOLD getelementptr instructions into loads and stores when possible,
          making use of some of the crazy X86 addressing modes.
      
      For example, the following C++ program fragment:
      
      struct complex {
          double re, im;
          complex(double r, double i) : re(r), im(i) {}
      };
      inline complex operator+(const complex& a, const complex& b) {
          return complex(a.re+b.re, a.im+b.im);
      }
      complex addone(const complex& arg) {
          return arg + complex(1,0);
      }
      
      Used to be compiled to:
      _Z6addoneRK7complex:
              mov %EAX, DWORD PTR [%ESP + 4]
              mov %ECX, DWORD PTR [%ESP + 8]
      ***     mov %EDX, %ECX
              fld QWORD PTR [%EDX]
              fld1
              faddp %ST(1)
      ***     add %ECX, 8
              fld QWORD PTR [%ECX]
              fldz
              faddp %ST(1)
      ***     mov %ECX, %EAX
              fxch %ST(1)
              fstp QWORD PTR [%ECX]
      ***     add %EAX, 8
              fstp QWORD PTR [%EAX]
              ret
      
      Now it is compiled to:
      _Z6addoneRK7complex:
              mov %EAX, DWORD PTR [%ESP + 4]
              mov %ECX, DWORD PTR [%ESP + 8]
              fld QWORD PTR [%ECX]
              fld1
              faddp %ST(1)
              fld QWORD PTR [%ECX + 8]
              fldz
              faddp %ST(1)
              fxch %ST(1)
              fstp QWORD PTR [%EAX]
              fstp QWORD PTR [%EAX + 8]
              ret
      
      Other programs should see similar improvements, across the board.  Note that
      in addition to reducing instruction count, this also reduces register pressure
      a lot, always a good thing on X86.  :)
      
      llvm-svn: 11819
      d1ee55d4
    • Chris Lattner's avatar
      Add a helper to create an addressing mode given all of the pieces. · 4b3514c1
      Chris Lattner authored
      llvm-svn: 11818
      4b3514c1
Loading