Skip to content
  1. Nov 05, 2005
    • Chris Lattner's avatar
      Turn sdiv into udiv if both operands have a clear sign bit. This occurs · dd0c1740
      Chris Lattner authored
      a few times in crafty:
      
      OLD:    %tmp.36 = div int %tmp.35, 8            ; <int> [#uses=1]
      NEW:    %tmp.36 = div uint %tmp.35, 8           ; <uint> [#uses=0]
      OLD:    %tmp.19 = div int %tmp.18, 8            ; <int> [#uses=1]
      NEW:    %tmp.19 = div uint %tmp.18, 8           ; <uint> [#uses=0]
      OLD:    %tmp.117 = div int %tmp.116, 8          ; <int> [#uses=1]
      NEW:    %tmp.117 = div uint %tmp.116, 8         ; <uint> [#uses=0]
      OLD:    %tmp.92 = div int %tmp.91, 8            ; <int> [#uses=1]
      NEW:    %tmp.92 = div uint %tmp.91, 8           ; <uint> [#uses=0]
      
      Which all turn into shrs.
      
      llvm-svn: 24190
      dd0c1740
    • Chris Lattner's avatar
      Turn srem -> urem when neither input has their sign bit set. This triggers · e9ff0eaf
      Chris Lattner authored
      8 times in vortex, allowing the srems to be turned into shrs:
      
      OLD:    %tmp.104 = rem int %tmp.5.i37, 16               ; <int> [#uses=1]
      NEW:    %tmp.104 = rem uint %tmp.5.i37, 16              ; <uint> [#uses=0]
      OLD:    %tmp.98 = rem int %tmp.5.i24, 16                ; <int> [#uses=1]
      NEW:    %tmp.98 = rem uint %tmp.5.i24, 16               ; <uint> [#uses=0]
      OLD:    %tmp.91 = rem int %tmp.5.i19, 8         ; <int> [#uses=1]
      NEW:    %tmp.91 = rem uint %tmp.5.i19, 8                ; <uint> [#uses=0]
      OLD:    %tmp.88 = rem int %tmp.5.i14, 8         ; <int> [#uses=1]
      NEW:    %tmp.88 = rem uint %tmp.5.i14, 8                ; <uint> [#uses=0]
      OLD:    %tmp.85 = rem int %tmp.5.i9, 1024               ; <int> [#uses=2]
      NEW:    %tmp.85 = rem uint %tmp.5.i9, 1024              ; <uint> [#uses=0]
      OLD:    %tmp.82 = rem int %tmp.5.i, 512         ; <int> [#uses=2]
      NEW:    %tmp.82 = rem uint %tmp.5.i1, 512               ; <uint> [#uses=0]
      OLD:    %tmp.48.i = rem int %tmp.5.i.i161, 4            ; <int> [#uses=1]
      NEW:    %tmp.48.i = rem uint %tmp.5.i.i161, 4           ; <uint> [#uses=0]
      OLD:    %tmp.20.i2 = rem int %tmp.5.i.i, 4              ; <int> [#uses=1]
      NEW:    %tmp.20.i2 = rem uint %tmp.5.i.i, 4             ; <uint> [#uses=0]
      
      it also occurs 9 times in gcc, but with odd constant divisors (1009 and 61)
      so the payoff isn't as great.
      
      llvm-svn: 24189
      e9ff0eaf
    • Jim Laskey's avatar
      Fix logic bug in finding retry slot in tally. · 904dbb4a
      Jim Laskey authored
      llvm-svn: 24188
      904dbb4a
  2. Nov 04, 2005
  3. Nov 03, 2005
  4. Nov 02, 2005
  5. Nov 01, 2005
  6. Oct 31, 2005
  7. Oct 30, 2005
    • Chris Lattner's avatar
      Significantly simplify this code and make it more aggressive. Instead of having · 6871b23d
      Chris Lattner authored
      a special case hack for X86, make the hack more general: if an incoming argument
      register is not used in any block other than the entry block, don't copy it to
      a vreg.  This helps us compile code like this:
      
      %struct.foo = type { int, int, [0 x ubyte] }
      int %test(%struct.foo* %X) {
              %tmp1 = getelementptr %struct.foo* %X, int 0, uint 2, int 100
              %tmp = load ubyte* %tmp1                ; <ubyte> [#uses=1]
              %tmp2 = cast ubyte %tmp to int          ; <int> [#uses=1]
              ret int %tmp2
      }
      
      to:
      
      _test:
              lbz r3, 108(r3)
              blr
      
      instead of:
      
      _test:
              lbz r2, 108(r3)
              or r3, r2, r2
              blr
      
      The (dead) copy emitted to copy r3 into a vreg for extra-block uses was
      increasing the live range of r3 past the load, preventing the coallescing.
      
      This implements CodeGen/PowerPC/reg-coallesce-simple.ll
      
      llvm-svn: 24115
      6871b23d
    • Chris Lattner's avatar
      Reduce the number of copies emitted as machine instructions by · dd5663df
      Chris Lattner authored
      generating results in vregs that will need them.  In the case of something
      like this:  CopyToReg((add X, Y), reg1024), we no longer emit code like
      this:
      
         reg1025 = add X, Y
         reg1024 = reg 1025
      
      Instead, we emit:
      
         reg1024 = add X, Y
      
      Whoa! :)
      
      llvm-svn: 24111
      dd5663df
    • Chris Lattner's avatar
    • Duraid Madina's avatar
      fix some broken comparisons, this affected the Pattern isel too. · 57b7ee9d
      Duraid Madina authored
      llvm-svn: 24109
      57b7ee9d
    • Chris Lattner's avatar
      This is implemented · e507a151
      Chris Lattner authored
      llvm-svn: 24107
      e507a151
    • Chris Lattner's avatar
      Codegen mul by negative power of two with a shift and negate. · a70878d4
      Chris Lattner authored
      This implements test/Regression/CodeGen/PowerPC/mul-neg-power-2.ll,
      producing:
      
      _foo:
              slwi r2, r3, 1
              subfic r3, r2, 63
              blr
      
      instead of:
      
      _foo:
              mulli r2, r3, -2
              addi r3, r2, 63
              blr
      
      llvm-svn: 24106
      a70878d4
    • Chris Lattner's avatar
      Fix a problem that Nate noticed with LSR: · f0b77f9a
      Chris Lattner authored
      When inserting code for an addrec expression with a non-unit stride, be
      more careful where we insert the multiply.  In particular, insert the multiply
      in the outermost loop we can, instead of the requested insertion point.
      
      This allows LSR to notice the mul in the right loop, reducing it when it gets
      to it.  This allows it to reduce the multiply, where before it missed it.
      
      This happens quite a bit in the test suite, for example, eliminating 2
      multiplies in art, 3 in ammp, 4 in apsi, reducing from 1050 multiplies to
      910 muls in galgel (!), from 877 to 859 in applu, and 36 to 30 in bzip2.
      
      This speeds up galgel from 16.45s to 16.01s, applu from 14.21 to 13.94s and
      fourinarow from 66.67s to 63.48s.
      
      This implements Transforms/LoopStrengthReduce/nested-reduce.ll
      
      llvm-svn: 24102
      f0b77f9a
  8. Oct 29, 2005
Loading