Skip to content
  1. Aug 03, 2006
    • Chris Lattner's avatar
      · 3ff62017
      Chris Lattner authored
      Changes:
        1. Update an obsolete comment.
        2. Make the sorting by base an explicit (though still N^2) step, so
           that the code is more clear on what it is doing.
        3. Partition uses so that uses inside the loop are handled before uses
           outside the loop.
      
      Note that none of these changes currently changes the code inserted by LSR,
      but they are a stepping stone to getting there.
      
      This code is the result of some crazy pair programming with Nate. :)
      
      llvm-svn: 29493
      3ff62017
  2. Jul 18, 2006
  3. Jun 29, 2006
  4. Jun 09, 2006
  5. Apr 12, 2006
  6. Mar 24, 2006
  7. Mar 22, 2006
  8. Mar 18, 2006
  9. Mar 17, 2006
  10. Mar 16, 2006
    • Evan Cheng's avatar
      For each loop, keep track of all the IV expressions inserted indexed by · 3df447d3
      Evan Cheng authored
      stride. For a set of uses of the IV of a stride which is a multiple
      of another stride, do not insert a new IV expression. Rather, reuse the
      previous IV and rewrite the uses as uses of IV expression multiplied by
      the factor.
      
      e.g.
      x = 0 ...; x ++
      y = 0 ...; y += 4
      then use of y can be rewritten as use of 4*x for x86.
      
      llvm-svn: 26803
      3df447d3
  11. Mar 14, 2006
  12. Feb 04, 2006
  13. Jan 23, 2006
  14. Jan 11, 2006
  15. Dec 05, 2005
  16. Oct 21, 2005
  17. Oct 20, 2005
    • Chris Lattner's avatar
      Do NOT touch FP ops with LSR. This fixes a testcase Nate sent me from an · 0c0b38bb
      Chris Lattner authored
      inner loop like this:
      
      LBB_RateConvertMono8AltiVec_2:  ; no_exit
              lis r2, ha16(.CPI_RateConvertMono8AltiVec_0)
              lfs f3, lo16(.CPI_RateConvertMono8AltiVec_0)(r2)
              fmr f3, f3
              fadd f0, f2, f0
              fadd f3, f0, f3
              fcmpu cr0, f3, f1
              bge cr0, LBB_RateConvertMono8AltiVec_2  ; no_exit
      
      to an inner loop like this:
      
      LBB_RateConvertMono8AltiVec_1:  ; no_exit
              fsub f2, f2, f1
              fcmpu cr0, f2, f1
              fmr f0, f2
              bge cr0, LBB_RateConvertMono8AltiVec_1  ; no_exit
      
      Doh! good catch!
      
      llvm-svn: 23838
      0c0b38bb
  18. Oct 11, 2005
  19. Oct 09, 2005
  20. Oct 03, 2005
    • Chris Lattner's avatar
      Make IVUseShouldUsePostIncValue more aggressive when the use is a PHI. In · f07a587c
      Chris Lattner authored
      particular, it should realize that phi's use their values in the pred block
      not the phi block itself.  This change turns our em3d loop from this:
      
      _test:
              cmpwi cr0, r4, 0
              bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
      LBB_test_1:     ; entry.loopexit_crit_edge
              li r2, 0
              b LBB_test_6    ; loopexit
      LBB_test_2:     ; entry.no_exit_crit_edge
              li r6, 0
      LBB_test_3:     ; no_exit
              or r2, r6, r6
              lwz r6, 0(r3)
              cmpw cr0, r6, r5
              beq cr0, LBB_test_6     ; loopexit
      LBB_test_4:     ; endif
              addi r3, r3, 4
              addi r6, r2, 1
              cmpw cr0, r6, r4
              blt cr0, LBB_test_3     ; no_exit
      LBB_test_5:     ; endif.loopexit.loopexit_crit_edge
              addi r3, r2, 1
              blr
      LBB_test_6:     ; loopexit
              or r3, r2, r2
              blr
      
      into:
      
      _test:
              cmpwi cr0, r4, 0
              bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
      LBB_test_1:     ; entry.loopexit_crit_edge
              li r2, 0
              b LBB_test_5    ; loopexit
      LBB_test_2:     ; entry.no_exit_crit_edge
              li r6, 0
      LBB_test_3:     ; no_exit
              lwz r2, 0(r3)
              cmpw cr0, r2, r5
              or r2, r6, r6
              beq cr0, LBB_test_5     ; loopexit
      LBB_test_4:     ; endif
              addi r3, r3, 4
              addi r6, r6, 1
              cmpw cr0, r6, r4
              or r2, r6, r6
              blt cr0, LBB_test_3     ; no_exit
      LBB_test_5:     ; loopexit
              or r3, r2, r2
              blr
      
      
      Unfortunately, this is actually worse code, because the register coallescer
      is getting confused somehow.  If it were doing its job right, it could turn the
      code into this:
      
      _test:
              cmpwi cr0, r4, 0
              bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
      LBB_test_1:     ; entry.loopexit_crit_edge
              li r6, 0
              b LBB_test_5    ; loopexit
      LBB_test_2:     ; entry.no_exit_crit_edge
              li r6, 0
      LBB_test_3:     ; no_exit
              lwz r2, 0(r3)
              cmpw cr0, r2, r5
              beq cr0, LBB_test_5     ; loopexit
      LBB_test_4:     ; endif
              addi r3, r3, 4
              addi r6, r6, 1
              cmpw cr0, r6, r4
              blt cr0, LBB_test_3     ; no_exit
      LBB_test_5:     ; loopexit
              or r3, r6, r6
              blr
      
      ... which I'll work on next. :)
      
      llvm-svn: 23604
      f07a587c
    • Chris Lattner's avatar
      Refactor some code into a function · e4ed42a4
      Chris Lattner authored
      llvm-svn: 23603
      e4ed42a4
    • Chris Lattner's avatar
      This break is bogus and I have no idea why it was there. Basically it prevents · 360928db
      Chris Lattner authored
      memoizing code when IV's are used by phinodes outside of loops.  In a simple
      example, we were getting this code before (note that r6 and r7 are isomorphic
      IV's):
      
              li r6, 0
              or r7, r6, r6
      LBB_test_3:     ; no_exit
              lwz r2, 0(r3)
              cmpw cr0, r2, r5
              or r2, r7, r7
              beq cr0, LBB_test_5     ; loopexit
      LBB_test_4:     ; endif
              addi r2, r7, 1
              addi r7, r7, 1
              addi r3, r3, 4
              addi r6, r6, 1
              cmpw cr0, r6, r4
              blt cr0, LBB_test_3     ; no_exit
      
      Now we get:
      
              li r6, 0
      LBB_test_3:     ; no_exit
              or r2, r6, r6
              lwz r6, 0(r3)
              cmpw cr0, r6, r5
              beq cr0, LBB_test_6     ; loopexit
      LBB_test_4:     ; endif
              addi r3, r3, 4
              addi r6, r2, 1
              cmpw cr0, r6, r4
              blt cr0, LBB_test_3     ; no_exit
      
      this was noticed in em3d.
      
      llvm-svn: 23602
      360928db
    • Chris Lattner's avatar
      when checking if we should move a split edge block outside of a loop, · 8fcce170
      Chris Lattner authored
      check the presplit pred, not the post-split pred.  This was causing us
      to make the wrong decision in some cases, leaving the critical edge block
      in the loop.
      
      llvm-svn: 23601
      8fcce170
  21. Sep 27, 2005
  22. Sep 13, 2005
  23. Sep 12, 2005
    • Chris Lattner's avatar
      Fix a regression from last night, which caused this pass to create invalid · 8048b85e
      Chris Lattner authored
      code for IV uses outside of loops that are not dominated by the latch block.
      We should only convert these uses to use the post-inc value if they ARE
      dominated by the latch block.
      
      Also use a new LoopInfo method to simplify some code.
      
      This fixes Transforms/LoopStrengthReduce/2005-09-12-UsesOutOutsideOfLoop.ll
      
      llvm-svn: 23318
      8048b85e
    • Chris Lattner's avatar
      _test: · a6764839
      Chris Lattner authored
              li r2, 0
      LBB_test_1:     ; no_exit.2
              li r5, 0
              stw r5, 0(r3)
              addi r2, r2, 1
              addi r3, r3, 4
              cmpwi cr0, r2, 701
              blt cr0, LBB_test_1     ; no_exit.2
      LBB_test_2:     ; loopexit.2.loopexit
              addi r2, r2, 1
              stw r2, 0(r4)
              blr
      [zion ~/llvm]$ cat > ~/xx
      Uses of IV's outside of the loop should use hte post-incremented version
      of the IV, not the preincremented version.  This helps many loops (e.g. in sixtrack)
      which used to generate code like this (this is the code from the
      dont-hoist-simple-loop-constants.ll testcase):
      
      _test:
              li r2, 0                 **** IV starts at 0
      LBB_test_1:     ; no_exit.2
              or r5, r2, r2            **** Copy for loop exit
              li r2, 0
              stw r2, 0(r3)
              addi r3, r3, 4
              addi r2, r5, 1
              addi r6, r5, 2           **** IV+2
              cmpwi cr0, r6, 701
              blt cr0, LBB_test_1     ; no_exit.2
      LBB_test_2:     ; loopexit.2.loopexit
              addi r2, r5, 2       ****  IV+2
              stw r2, 0(r4)
              blr
      
      And now generated code like this:
      
      _test:
              li r2, 1               *** IV starts at 1
      LBB_test_1:     ; no_exit.2
              li r5, 0
              stw r5, 0(r3)
              addi r2, r2, 1
              addi r3, r3, 4
              cmpwi cr0, r2, 701     *** IV.postinc + 0
              blt cr0, LBB_test_1
      LBB_test_2:     ; loopexit.2.loopexit
              stw r2, 0(r4)          *** IV.postinc + 0
              blr
      
      llvm-svn: 23313
      a6764839
  24. Sep 10, 2005
    • Chris Lattner's avatar
      implement Transforms/LoopStrengthReduce/dont-hoist-simple-loop-constants.ll. · 530fe6ab
      Chris Lattner authored
      We used to emit this code for it:
      
      _test:
              li r2, 1     ;; Value tying up a register for the whole loop
              li r5, 0
      LBB_test_1:     ; no_exit.2
              or r6, r5, r5
              li r5, 0
              stw r5, 0(r3)
              addi r5, r6, 1
              addi r3, r3, 4
              add r7, r2, r5  ;; should be addi r7, r5, 1
              cmpwi cr0, r7, 701
              blt cr0, LBB_test_1     ; no_exit.2
      LBB_test_2:     ; loopexit.2.loopexit
              addi r2, r6, 2
              stw r2, 0(r4)
              blr
      
      now we emit this:
      
      _test:
              li r2, 0
      LBB_test_1:     ; no_exit.2
              or r5, r2, r2
              li r2, 0
              stw r2, 0(r3)
              addi r3, r3, 4
              addi r2, r5, 1
              addi r6, r5, 2   ;; whoa, fold those adds!
              cmpwi cr0, r6, 701
              blt cr0, LBB_test_1     ; no_exit.2
      LBB_test_2:     ; loopexit.2.loopexit
              addi r2, r5, 2
              stw r2, 0(r4)
              blr
      
      more improvement coming.
      
      llvm-svn: 23306
      530fe6ab
  25. Aug 17, 2005
  26. Aug 16, 2005
  27. Aug 13, 2005
    • Chris Lattner's avatar
      Ooops, don't forget to clear this. The real inner loop is now: · 47d3ec35
      Chris Lattner authored
      .LBB_foo_3:     ; no_exit.1
              lfd f2, 0(r9)
              lfd f3, 8(r9)
              fmul f4, f1, f2
              fmadd f4, f0, f3, f4
              stfd f4, 8(r9)
              fmul f3, f1, f3
              fmsub f2, f0, f2, f3
              stfd f2, 0(r9)
              addi r9, r9, 16
              addi r8, r8, 1
              cmpw cr0, r8, r4
              ble .LBB_foo_3  ; no_exit.1
      
      llvm-svn: 22782
      47d3ec35
    • Chris Lattner's avatar
      Recursively scan scev expressions for common subexpressions. This allows us · 5949d490
      Chris Lattner authored
      to handle nested loops much better, for example, by being able to tell that
      these two expressions:
      
      {( 8 + ( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp 12)}<loopentry.1>
      
      {(( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp12)}<loopentry.1>
      
      Have the following common part that can be shared:
      {(( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp12)}<loopentry.1>
      
      This allows us to codegen an important inner loop in 168.wupwise as:
      
      .LBB_foo_4:     ; no_exit.1
              lfd f2, 16(r9)
              fmul f3, f0, f2
              fmul f2, f1, f2
              fadd f4, f3, f2
              stfd f4, 8(r9)
              fsub f2, f3, f2
              stfd f2, 16(r9)
              addi r8, r8, 1
              addi r9, r9, 16
              cmpw cr0, r8, r4
              ble .LBB_foo_4  ; no_exit.1
      
      instead of:
      
      .LBB_foo_3:     ; no_exit.1
              lfdx f2, r6, r9
              add r10, r6, r9
              lfd f3, 8(r10)
              fmul f4, f1, f2
              fmadd f4, f0, f3, f4
              stfd f4, 8(r10)
              fmul f3, f1, f3
              fmsub f2, f0, f2, f3
              stfdx f2, r6, r9
              addi r9, r9, 16
              addi r8, r8, 1
              cmpw cr0, r8, r4
              ble .LBB_foo_3  ; no_exit.1
      
      llvm-svn: 22781
      5949d490
    • Chris Lattner's avatar
      When splitting critical edges, make sure not to leave the new block in the · 8447b495
      Chris Lattner authored
      middle of the loop.  This turns a critical loop in gzip into this:
      
      .LBB_test_1:    ; loopentry
              or r27, r28, r28
              add r28, r3, r27
              lhz r28, 3(r28)
              add r26, r4, r27
              lhz r26, 3(r26)
              cmpw cr0, r28, r26
              bne .LBB_test_8 ; loopentry.loopexit_crit_edge
      .LBB_test_2:    ; shortcirc_next.0
              add r28, r3, r27
              lhz r28, 5(r28)
              add r26, r4, r27
              lhz r26, 5(r26)
              cmpw cr0, r28, r26
              bne .LBB_test_7 ; shortcirc_next.0.loopexit_crit_edge
      .LBB_test_3:    ; shortcirc_next.1
              add r28, r3, r27
              lhz r28, 7(r28)
              add r26, r4, r27
              lhz r26, 7(r26)
              cmpw cr0, r28, r26
              bne .LBB_test_6 ; shortcirc_next.1.loopexit_crit_edge
      .LBB_test_4:    ; shortcirc_next.2
              add r28, r3, r27
              lhz r26, 9(r28)
              add r28, r4, r27
              lhz r25, 9(r28)
              addi r28, r27, 8
              cmpw cr7, r26, r25
              mfcr r26, 1
              rlwinm r26, r26, 31, 31, 31
              add r25, r8, r27
              cmpw cr7, r25, r7
              mfcr r25, 1
              rlwinm r25, r25, 29, 31, 31
              and. r26, r26, r25
              bne .LBB_test_1 ; loopentry
      
      instead of this:
      
      .LBB_test_1:    ; loopentry
              or r27, r28, r28
              add r28, r3, r27
              lhz r28, 3(r28)
              add r26, r4, r27
              lhz r26, 3(r26)
              cmpw cr0, r28, r26
              beq .LBB_test_3 ; shortcirc_next.0
      .LBB_test_2:    ; loopentry.loopexit_crit_edge
              add r2, r30, r27
              add r8, r29, r27
              b .LBB_test_9   ; loopexit
      .LBB_test_3:    ; shortcirc_next.0
              add r28, r3, r27
              lhz r28, 5(r28)
              add r26, r4, r27
              lhz r26, 5(r26)
              cmpw cr0, r28, r26
              beq .LBB_test_5 ; shortcirc_next.1
      .LBB_test_4:    ; shortcirc_next.0.loopexit_crit_edge
              add r2, r11, r27
              add r8, r12, r27
              b .LBB_test_9   ; loopexit
      .LBB_test_5:    ; shortcirc_next.1
              add r28, r3, r27
              lhz r28, 7(r28)
              add r26, r4, r27
              lhz r26, 7(r26)
              cmpw cr0, r28, r26
              beq .LBB_test_7 ; shortcirc_next.2
      .LBB_test_6:    ; shortcirc_next.1.loopexit_crit_edge
              add r2, r9, r27
              add r8, r10, r27
              b .LBB_test_9   ; loopexit
      .LBB_test_7:    ; shortcirc_next.2
              add r28, r3, r27
              lhz r26, 9(r28)
              add r28, r4, r27
              lhz r25, 9(r28)
              addi r28, r27, 8
              cmpw cr7, r26, r25
              mfcr r26, 1
              rlwinm r26, r26, 31, 31, 31
              add r25, r8, r27
              cmpw cr7, r25, r7
              mfcr r25, 1
              rlwinm r25, r25, 29, 31, 31
              and. r26, r26, r25
              bne .LBB_test_1 ; loopentry
      
      Next up, improve the code for the loop.
      
      llvm-svn: 22769
      8447b495
    • Chris Lattner's avatar
      Fix a FIXME: if we are inserting code for a PHI argument, split the critical · 4fec86d3
      Chris Lattner authored
      edge so that the code is not always executed for both operands.  This
      prevents LSR from inserting code into loops whose exit blocks contain
      PHI uses of IV expressions (which are outside of loops).  On gzip, for
      example, we turn this ugly code:
      
      .LBB_test_1:    ; loopentry
              add r27, r3, r28
              lhz r27, 3(r27)
              add r26, r4, r28
              lhz r26, 3(r26)
              add r25, r30, r28    ;; Only live if exiting the loop
              add r24, r29, r28    ;; Only live if exiting the loop
              cmpw cr0, r27, r26
              bne .LBB_test_5 ; loopexit
      
      into this:
      
      .LBB_test_1:    ; loopentry
              or r27, r28, r28
              add r28, r3, r27
              lhz r28, 3(r28)
              add r26, r4, r27
              lhz r26, 3(r26)
              cmpw cr0, r28, r26
              beq .LBB_test_3 ; shortcirc_next.0
      .LBB_test_2:    ; loopentry.loopexit_crit_edge
              add r2, r30, r27
              add r8, r29, r27
              b .LBB_test_9   ; loopexit
      .LBB_test_2:    ; shortcirc_next.0
              ...
              blt .LBB_test_1
      
      
      into this:
      
      .LBB_test_1:    ; loopentry
              or r27, r28, r28
              add r28, r3, r27
              lhz r28, 3(r28)
              add r26, r4, r27
              lhz r26, 3(r26)
              cmpw cr0, r28, r26
              beq .LBB_test_3 ; shortcirc_next.0
      .LBB_test_2:    ; loopentry.loopexit_crit_edge
              add r2, r30, r27
              add r8, r29, r27
              b .LBB_t_3:    ; shortcirc_next.0
      .LBB_test_3:    ; shortcirc_next.0
              ...
              blt .LBB_test_1
      
      
      Next step: get the block out of the loop so that the loop is all
      fall-throughs again.
      
      llvm-svn: 22766
      4fec86d3
Loading