Skip to content
  1. Feb 25, 2004
    • Chris Lattner's avatar
      Add a new pass, run internalize first · b66a35ef
      Chris Lattner authored
      llvm-svn: 11839
      b66a35ef
    • Chris Lattner's avatar
      Add a new pass · 0f39359d
      Chris Lattner authored
      llvm-svn: 11838
      0f39359d
    • Chris Lattner's avatar
      Add prototype · 14da4ead
      Chris Lattner authored
      llvm-svn: 11837
      14da4ead
    • Chris Lattner's avatar
      My faith in programmers has been found to be totally misplaced. One would · 8d1da1ab
      Chris Lattner authored
      assume that if they don't intend to write to a global variable, that they
      would mark it as constant.  However, there are people that don't understand
      that the compiler can do nice things for them if they give it the information
      it needs.
      
      This pass looks for blatently obvious globals that are only ever read from.
      Though it uses a trivially simple "alias analysis" of sorts, it is still able
      to do amazing things to important benchmarks.  253.perlbmk, for example,
      contains several ***GIANT*** function pointer tables that are not marked
      constant and should be.  Marking them constant allows the optimizer to turn
      a whole bunch of indirect calls into direct calls.  Note that only a link-time
      optimizer can do this transformation, but perlbmk does have several strings
      and other minor globals that can be marked constant by this pass when run
      from GCCAS.
      
      176.gcc has a ton of strings and large tables that are marked constant, both
      at compile time (38 of them) and at link time (48 more).  Other benchmarks
      give similar results, though it seems like big ones have disproportionally
      more than small ones.
      
      This pass is extremely quick and does good things.  I'm going to enable it
      in gccas & gccld.  Not bad for 50 SLOC.
      
      llvm-svn: 11836
      8d1da1ab
    • Misha Brukman's avatar
      SparcV8 regs are really 32-bit, not 64! Thanks, Chris. · 564654d6
      Misha Brukman authored
      llvm-svn: 11835
      564654d6
    • Misha Brukman's avatar
      Clean up the tablegen descriptions for SparcV8. · f8dcdcc8
      Misha Brukman authored
      llvm-svn: 11834
      f8dcdcc8
    • Misha Brukman's avatar
      2122b969
    • Misha Brukman's avatar
      0e3a7ca5
    • Brian Gaeke's avatar
      Note that this test is currently expected to fail. · 232483ae
      Brian Gaeke authored
      llvm-svn: 11831
      232483ae
    • Chris Lattner's avatar
      Add an assertion · f5a393a1
      Chris Lattner authored
      llvm-svn: 11830
      f5a393a1
    • Chris Lattner's avatar
      Fix failures in 099.go due to the cfgsimplify pass creating switch instructions · 64c9b223
      Chris Lattner authored
      where there did not used to be any before
      
      llvm-svn: 11829
      64c9b223
    • Brian Gaeke's avatar
      SparcV8 skeleton · 9a5bd7fc
      Brian Gaeke authored
      llvm-svn: 11828
      9a5bd7fc
    • Brian Gaeke's avatar
    • Brian Gaeke's avatar
      Great renaming: Sparc --> SparcV9 · 94e95d2b
      Brian Gaeke authored
      llvm-svn: 11826
      94e95d2b
    • Chris Lattner's avatar
      Add a bunch more functions used by perlbmk · 864c9014
      Chris Lattner authored
      llvm-svn: 11824
      864c9014
    • John Criswell's avatar
      Updated to use llc to generate CBE code. · 9f547bce
      John Criswell authored
      llvm-svn: 11823
      9f547bce
    • Chris Lattner's avatar
      Substantial improvements and cleanups for the release notes. We were missing · 8ebf2538
      Chris Lattner authored
      a bunch of stuff!  :)
      
      llvm-svn: 11822
      8ebf2538
    • Chris Lattner's avatar
      Fix incorrect debug code · 9c6833c5
      Chris Lattner authored
      llvm-svn: 11821
      9c6833c5
    • Chris Lattner's avatar
      Teach the instruction selector how to transform 'array' GEP computations into X86 · 309327a4
      Chris Lattner authored
      scaled indexes.  This allows us to compile GEP's like this:
      
      int* %test([10 x { int, { int } }]* %X, int %Idx) {
              %Idx = cast int %Idx to long
              %X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0
              ret int* %X
      }
      
      Into a single address computation:
      
      test:
              mov %EAX, DWORD PTR [%ESP + 4]
              mov %ECX, DWORD PTR [%ESP + 8]
              lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4]
              ret
      
      Before it generated:
      test:
              mov %EAX, DWORD PTR [%ESP + 4]
              mov %ECX, DWORD PTR [%ESP + 8]
              shl %ECX, 3
              add %EAX, %ECX
              lea %EAX, DWORD PTR [%EAX + 4]
              ret
      
      This is useful for things like int/float/double arrays, as the indexing can be folded into
      the loads&stores, reducing register pressure and decreasing the pressure on the decode unit.
      With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot.  On
      bzip2 for example, we go from this:
      
      10665 asm-printer           - Number of machine instrs printed
         40 ra-local              - Number of loads/stores folded into instructions
       1708 ra-local              - Number of loads added
       1532 ra-local              - Number of stores added
       1354 twoaddressinstruction - Number of instructions added
       1354 twoaddressinstruction - Number of two-address instructions
       2794 x86-peephole          - Number of peephole optimization performed
      
      to this:
      9873 asm-printer           - Number of machine instrs printed
        41 ra-local              - Number of loads/stores folded into instructions
      1710 ra-local              - Number of loads added
      1521 ra-local              - Number of stores added
       789 twoaddressinstruction - Number of instructions added
       789 twoaddressinstruction - Number of two-address instructions
      2142 x86-peephole          - Number of peephole optimization performed
      
      ... and these types of instructions are often in tight loops.
      
      Linear scan is also helped, but not as much.  It goes from:
      
      8787 asm-printer           - Number of machine instrs printed
      2389 liveintervals         - Number of identity moves eliminated after coalescing
      2288 liveintervals         - Number of interval joins performed
      3522 liveintervals         - Number of intervals after coalescing
      5810 liveintervals         - Number of original intervals
       700 spiller               - Number of loads added
       487 spiller               - Number of stores added
       303 spiller               - Number of register spills
      1354 twoaddressinstruction - Number of instructions added
      1354 twoaddressinstruction - Number of two-address instructions
       363 x86-peephole          - Number of peephole optimization performed
      
      to:
      
      7982 asm-printer           - Number of machine instrs printed
      1759 liveintervals         - Number of identity moves eliminated after coalescing
      1658 liveintervals         - Number of interval joins performed
      3282 liveintervals         - Number of intervals after coalescing
      4940 liveintervals         - Number of original intervals
       635 spiller               - Number of loads added
       452 spiller               - Number of stores added
       288 spiller               - Number of register spills
       789 twoaddressinstruction - Number of instructions added
       789 twoaddressinstruction - Number of two-address instructions
       258 x86-peephole          - Number of peephole optimization performed
      
      Though I'm not complaining about the drop in the number of intervals.  :)
      
      llvm-svn: 11820
      309327a4
    • Chris Lattner's avatar
      * Make the previous patch more efficient by not allocating a temporary MachineInstr · d1ee55d4
      Chris Lattner authored
        to do analysis.
      
      *** FOLD getelementptr instructions into loads and stores when possible,
          making use of some of the crazy X86 addressing modes.
      
      For example, the following C++ program fragment:
      
      struct complex {
          double re, im;
          complex(double r, double i) : re(r), im(i) {}
      };
      inline complex operator+(const complex& a, const complex& b) {
          return complex(a.re+b.re, a.im+b.im);
      }
      complex addone(const complex& arg) {
          return arg + complex(1,0);
      }
      
      Used to be compiled to:
      _Z6addoneRK7complex:
              mov %EAX, DWORD PTR [%ESP + 4]
              mov %ECX, DWORD PTR [%ESP + 8]
      ***     mov %EDX, %ECX
              fld QWORD PTR [%EDX]
              fld1
              faddp %ST(1)
      ***     add %ECX, 8
              fld QWORD PTR [%ECX]
              fldz
              faddp %ST(1)
      ***     mov %ECX, %EAX
              fxch %ST(1)
              fstp QWORD PTR [%ECX]
      ***     add %EAX, 8
              fstp QWORD PTR [%EAX]
              ret
      
      Now it is compiled to:
      _Z6addoneRK7complex:
              mov %EAX, DWORD PTR [%ESP + 4]
              mov %ECX, DWORD PTR [%ESP + 8]
              fld QWORD PTR [%ECX]
              fld1
              faddp %ST(1)
              fld QWORD PTR [%ECX + 8]
              fldz
              faddp %ST(1)
              fxch %ST(1)
              fstp QWORD PTR [%EAX]
              fstp QWORD PTR [%EAX + 8]
              ret
      
      Other programs should see similar improvements, across the board.  Note that
      in addition to reducing instruction count, this also reduces register pressure
      a lot, always a good thing on X86.  :)
      
      llvm-svn: 11819
      d1ee55d4
    • Chris Lattner's avatar
      Add a helper to create an addressing mode given all of the pieces. · 4b3514c1
      Chris Lattner authored
      llvm-svn: 11818
      4b3514c1
    • Chris Lattner's avatar
      add an inefficient way of folding structure and constant array indexes together · d825d30f
      Chris Lattner authored
      into a single LEA instruction.  This should improve the code generated for
      things like X->A.B.C[12].D.
      
      The bigger benefit is still coming though.  Note that this uses an LEA instruction
      instead of an add, giving the register allocator more freedom.  We should probably
      never generate ADDri32's.
      
      llvm-svn: 11817
      d825d30f
    • Chris Lattner's avatar
      Implement special case for storing an immediate into memory so that we don't need · f85e33cd
      Chris Lattner authored
      an intermediate register.
      
      llvm-svn: 11816
      f85e33cd
    • Brian Gaeke's avatar
      Cygwin defines log2 as a macro. Undef it here IFF it has already been defined, · 04cff21c
      Brian Gaeke authored
      so that we always get the inline function instead. Remember, kids, like it says
      in the GCC manual, "An Inline Function is As Fast As a Macro."
      
      llvm-svn: 11815
      04cff21c
  2. Feb 24, 2004
Loading