Skip to content
  1. Feb 04, 2006
  2. Jan 23, 2006
  3. Jan 11, 2006
  4. Dec 05, 2005
  5. Oct 21, 2005
  6. Oct 20, 2005
    • Chris Lattner's avatar
      Do NOT touch FP ops with LSR. This fixes a testcase Nate sent me from an · 0c0b38bb
      Chris Lattner authored
      inner loop like this:
      
      LBB_RateConvertMono8AltiVec_2:  ; no_exit
              lis r2, ha16(.CPI_RateConvertMono8AltiVec_0)
              lfs f3, lo16(.CPI_RateConvertMono8AltiVec_0)(r2)
              fmr f3, f3
              fadd f0, f2, f0
              fadd f3, f0, f3
              fcmpu cr0, f3, f1
              bge cr0, LBB_RateConvertMono8AltiVec_2  ; no_exit
      
      to an inner loop like this:
      
      LBB_RateConvertMono8AltiVec_1:  ; no_exit
              fsub f2, f2, f1
              fcmpu cr0, f2, f1
              fmr f0, f2
              bge cr0, LBB_RateConvertMono8AltiVec_1  ; no_exit
      
      Doh! good catch!
      
      llvm-svn: 23838
      0c0b38bb
  7. Oct 11, 2005
  8. Oct 09, 2005
  9. Oct 03, 2005
    • Chris Lattner's avatar
      Make IVUseShouldUsePostIncValue more aggressive when the use is a PHI. In · f07a587c
      Chris Lattner authored
      particular, it should realize that phi's use their values in the pred block
      not the phi block itself.  This change turns our em3d loop from this:
      
      _test:
              cmpwi cr0, r4, 0
              bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
      LBB_test_1:     ; entry.loopexit_crit_edge
              li r2, 0
              b LBB_test_6    ; loopexit
      LBB_test_2:     ; entry.no_exit_crit_edge
              li r6, 0
      LBB_test_3:     ; no_exit
              or r2, r6, r6
              lwz r6, 0(r3)
              cmpw cr0, r6, r5
              beq cr0, LBB_test_6     ; loopexit
      LBB_test_4:     ; endif
              addi r3, r3, 4
              addi r6, r2, 1
              cmpw cr0, r6, r4
              blt cr0, LBB_test_3     ; no_exit
      LBB_test_5:     ; endif.loopexit.loopexit_crit_edge
              addi r3, r2, 1
              blr
      LBB_test_6:     ; loopexit
              or r3, r2, r2
              blr
      
      into:
      
      _test:
              cmpwi cr0, r4, 0
              bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
      LBB_test_1:     ; entry.loopexit_crit_edge
              li r2, 0
              b LBB_test_5    ; loopexit
      LBB_test_2:     ; entry.no_exit_crit_edge
              li r6, 0
      LBB_test_3:     ; no_exit
              lwz r2, 0(r3)
              cmpw cr0, r2, r5
              or r2, r6, r6
              beq cr0, LBB_test_5     ; loopexit
      LBB_test_4:     ; endif
              addi r3, r3, 4
              addi r6, r6, 1
              cmpw cr0, r6, r4
              or r2, r6, r6
              blt cr0, LBB_test_3     ; no_exit
      LBB_test_5:     ; loopexit
              or r3, r2, r2
              blr
      
      
      Unfortunately, this is actually worse code, because the register coallescer
      is getting confused somehow.  If it were doing its job right, it could turn the
      code into this:
      
      _test:
              cmpwi cr0, r4, 0
              bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
      LBB_test_1:     ; entry.loopexit_crit_edge
              li r6, 0
              b LBB_test_5    ; loopexit
      LBB_test_2:     ; entry.no_exit_crit_edge
              li r6, 0
      LBB_test_3:     ; no_exit
              lwz r2, 0(r3)
              cmpw cr0, r2, r5
              beq cr0, LBB_test_5     ; loopexit
      LBB_test_4:     ; endif
              addi r3, r3, 4
              addi r6, r6, 1
              cmpw cr0, r6, r4
              blt cr0, LBB_test_3     ; no_exit
      LBB_test_5:     ; loopexit
              or r3, r6, r6
              blr
      
      ... which I'll work on next. :)
      
      llvm-svn: 23604
      f07a587c
    • Chris Lattner's avatar
      Refactor some code into a function · e4ed42a4
      Chris Lattner authored
      llvm-svn: 23603
      e4ed42a4
    • Chris Lattner's avatar
      This break is bogus and I have no idea why it was there. Basically it prevents · 360928db
      Chris Lattner authored
      memoizing code when IV's are used by phinodes outside of loops.  In a simple
      example, we were getting this code before (note that r6 and r7 are isomorphic
      IV's):
      
              li r6, 0
              or r7, r6, r6
      LBB_test_3:     ; no_exit
              lwz r2, 0(r3)
              cmpw cr0, r2, r5
              or r2, r7, r7
              beq cr0, LBB_test_5     ; loopexit
      LBB_test_4:     ; endif
              addi r2, r7, 1
              addi r7, r7, 1
              addi r3, r3, 4
              addi r6, r6, 1
              cmpw cr0, r6, r4
              blt cr0, LBB_test_3     ; no_exit
      
      Now we get:
      
              li r6, 0
      LBB_test_3:     ; no_exit
              or r2, r6, r6
              lwz r6, 0(r3)
              cmpw cr0, r6, r5
              beq cr0, LBB_test_6     ; loopexit
      LBB_test_4:     ; endif
              addi r3, r3, 4
              addi r6, r2, 1
              cmpw cr0, r6, r4
              blt cr0, LBB_test_3     ; no_exit
      
      this was noticed in em3d.
      
      llvm-svn: 23602
      360928db
    • Chris Lattner's avatar
      when checking if we should move a split edge block outside of a loop, · 8fcce170
      Chris Lattner authored
      check the presplit pred, not the post-split pred.  This was causing us
      to make the wrong decision in some cases, leaving the critical edge block
      in the loop.
      
      llvm-svn: 23601
      8fcce170
  10. Sep 27, 2005
  11. Sep 13, 2005
  12. Sep 12, 2005
    • Chris Lattner's avatar
      Fix a regression from last night, which caused this pass to create invalid · 8048b85e
      Chris Lattner authored
      code for IV uses outside of loops that are not dominated by the latch block.
      We should only convert these uses to use the post-inc value if they ARE
      dominated by the latch block.
      
      Also use a new LoopInfo method to simplify some code.
      
      This fixes Transforms/LoopStrengthReduce/2005-09-12-UsesOutOutsideOfLoop.ll
      
      llvm-svn: 23318
      8048b85e
    • Chris Lattner's avatar
      _test: · a6764839
      Chris Lattner authored
              li r2, 0
      LBB_test_1:     ; no_exit.2
              li r5, 0
              stw r5, 0(r3)
              addi r2, r2, 1
              addi r3, r3, 4
              cmpwi cr0, r2, 701
              blt cr0, LBB_test_1     ; no_exit.2
      LBB_test_2:     ; loopexit.2.loopexit
              addi r2, r2, 1
              stw r2, 0(r4)
              blr
      [zion ~/llvm]$ cat > ~/xx
      Uses of IV's outside of the loop should use hte post-incremented version
      of the IV, not the preincremented version.  This helps many loops (e.g. in sixtrack)
      which used to generate code like this (this is the code from the
      dont-hoist-simple-loop-constants.ll testcase):
      
      _test:
              li r2, 0                 **** IV starts at 0
      LBB_test_1:     ; no_exit.2
              or r5, r2, r2            **** Copy for loop exit
              li r2, 0
              stw r2, 0(r3)
              addi r3, r3, 4
              addi r2, r5, 1
              addi r6, r5, 2           **** IV+2
              cmpwi cr0, r6, 701
              blt cr0, LBB_test_1     ; no_exit.2
      LBB_test_2:     ; loopexit.2.loopexit
              addi r2, r5, 2       ****  IV+2
              stw r2, 0(r4)
              blr
      
      And now generated code like this:
      
      _test:
              li r2, 1               *** IV starts at 1
      LBB_test_1:     ; no_exit.2
              li r5, 0
              stw r5, 0(r3)
              addi r2, r2, 1
              addi r3, r3, 4
              cmpwi cr0, r2, 701     *** IV.postinc + 0
              blt cr0, LBB_test_1
      LBB_test_2:     ; loopexit.2.loopexit
              stw r2, 0(r4)          *** IV.postinc + 0
              blr
      
      llvm-svn: 23313
      a6764839
  13. Sep 10, 2005
    • Chris Lattner's avatar
      implement Transforms/LoopStrengthReduce/dont-hoist-simple-loop-constants.ll. · 530fe6ab
      Chris Lattner authored
      We used to emit this code for it:
      
      _test:
              li r2, 1     ;; Value tying up a register for the whole loop
              li r5, 0
      LBB_test_1:     ; no_exit.2
              or r6, r5, r5
              li r5, 0
              stw r5, 0(r3)
              addi r5, r6, 1
              addi r3, r3, 4
              add r7, r2, r5  ;; should be addi r7, r5, 1
              cmpwi cr0, r7, 701
              blt cr0, LBB_test_1     ; no_exit.2
      LBB_test_2:     ; loopexit.2.loopexit
              addi r2, r6, 2
              stw r2, 0(r4)
              blr
      
      now we emit this:
      
      _test:
              li r2, 0
      LBB_test_1:     ; no_exit.2
              or r5, r2, r2
              li r2, 0
              stw r2, 0(r3)
              addi r3, r3, 4
              addi r2, r5, 1
              addi r6, r5, 2   ;; whoa, fold those adds!
              cmpwi cr0, r6, 701
              blt cr0, LBB_test_1     ; no_exit.2
      LBB_test_2:     ; loopexit.2.loopexit
              addi r2, r5, 2
              stw r2, 0(r4)
              blr
      
      more improvement coming.
      
      llvm-svn: 23306
      530fe6ab
  14. Aug 17, 2005
  15. Aug 16, 2005
  16. Aug 13, 2005
    • Chris Lattner's avatar
      Ooops, don't forget to clear this. The real inner loop is now: · 47d3ec35
      Chris Lattner authored
      .LBB_foo_3:     ; no_exit.1
              lfd f2, 0(r9)
              lfd f3, 8(r9)
              fmul f4, f1, f2
              fmadd f4, f0, f3, f4
              stfd f4, 8(r9)
              fmul f3, f1, f3
              fmsub f2, f0, f2, f3
              stfd f2, 0(r9)
              addi r9, r9, 16
              addi r8, r8, 1
              cmpw cr0, r8, r4
              ble .LBB_foo_3  ; no_exit.1
      
      llvm-svn: 22782
      47d3ec35
    • Chris Lattner's avatar
      Recursively scan scev expressions for common subexpressions. This allows us · 5949d490
      Chris Lattner authored
      to handle nested loops much better, for example, by being able to tell that
      these two expressions:
      
      {( 8 + ( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp 12)}<loopentry.1>
      
      {(( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp12)}<loopentry.1>
      
      Have the following common part that can be shared:
      {(( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp12)}<loopentry.1>
      
      This allows us to codegen an important inner loop in 168.wupwise as:
      
      .LBB_foo_4:     ; no_exit.1
              lfd f2, 16(r9)
              fmul f3, f0, f2
              fmul f2, f1, f2
              fadd f4, f3, f2
              stfd f4, 8(r9)
              fsub f2, f3, f2
              stfd f2, 16(r9)
              addi r8, r8, 1
              addi r9, r9, 16
              cmpw cr0, r8, r4
              ble .LBB_foo_4  ; no_exit.1
      
      instead of:
      
      .LBB_foo_3:     ; no_exit.1
              lfdx f2, r6, r9
              add r10, r6, r9
              lfd f3, 8(r10)
              fmul f4, f1, f2
              fmadd f4, f0, f3, f4
              stfd f4, 8(r10)
              fmul f3, f1, f3
              fmsub f2, f0, f2, f3
              stfdx f2, r6, r9
              addi r9, r9, 16
              addi r8, r8, 1
              cmpw cr0, r8, r4
              ble .LBB_foo_3  ; no_exit.1
      
      llvm-svn: 22781
      5949d490
    • Chris Lattner's avatar
      When splitting critical edges, make sure not to leave the new block in the · 8447b495
      Chris Lattner authored
      middle of the loop.  This turns a critical loop in gzip into this:
      
      .LBB_test_1:    ; loopentry
              or r27, r28, r28
              add r28, r3, r27
              lhz r28, 3(r28)
              add r26, r4, r27
              lhz r26, 3(r26)
              cmpw cr0, r28, r26
              bne .LBB_test_8 ; loopentry.loopexit_crit_edge
      .LBB_test_2:    ; shortcirc_next.0
              add r28, r3, r27
              lhz r28, 5(r28)
              add r26, r4, r27
              lhz r26, 5(r26)
              cmpw cr0, r28, r26
              bne .LBB_test_7 ; shortcirc_next.0.loopexit_crit_edge
      .LBB_test_3:    ; shortcirc_next.1
              add r28, r3, r27
              lhz r28, 7(r28)
              add r26, r4, r27
              lhz r26, 7(r26)
              cmpw cr0, r28, r26
              bne .LBB_test_6 ; shortcirc_next.1.loopexit_crit_edge
      .LBB_test_4:    ; shortcirc_next.2
              add r28, r3, r27
              lhz r26, 9(r28)
              add r28, r4, r27
              lhz r25, 9(r28)
              addi r28, r27, 8
              cmpw cr7, r26, r25
              mfcr r26, 1
              rlwinm r26, r26, 31, 31, 31
              add r25, r8, r27
              cmpw cr7, r25, r7
              mfcr r25, 1
              rlwinm r25, r25, 29, 31, 31
              and. r26, r26, r25
              bne .LBB_test_1 ; loopentry
      
      instead of this:
      
      .LBB_test_1:    ; loopentry
              or r27, r28, r28
              add r28, r3, r27
              lhz r28, 3(r28)
              add r26, r4, r27
              lhz r26, 3(r26)
              cmpw cr0, r28, r26
              beq .LBB_test_3 ; shortcirc_next.0
      .LBB_test_2:    ; loopentry.loopexit_crit_edge
              add r2, r30, r27
              add r8, r29, r27
              b .LBB_test_9   ; loopexit
      .LBB_test_3:    ; shortcirc_next.0
              add r28, r3, r27
              lhz r28, 5(r28)
              add r26, r4, r27
              lhz r26, 5(r26)
              cmpw cr0, r28, r26
              beq .LBB_test_5 ; shortcirc_next.1
      .LBB_test_4:    ; shortcirc_next.0.loopexit_crit_edge
              add r2, r11, r27
              add r8, r12, r27
              b .LBB_test_9   ; loopexit
      .LBB_test_5:    ; shortcirc_next.1
              add r28, r3, r27
              lhz r28, 7(r28)
              add r26, r4, r27
              lhz r26, 7(r26)
              cmpw cr0, r28, r26
              beq .LBB_test_7 ; shortcirc_next.2
      .LBB_test_6:    ; shortcirc_next.1.loopexit_crit_edge
              add r2, r9, r27
              add r8, r10, r27
              b .LBB_test_9   ; loopexit
      .LBB_test_7:    ; shortcirc_next.2
              add r28, r3, r27
              lhz r26, 9(r28)
              add r28, r4, r27
              lhz r25, 9(r28)
              addi r28, r27, 8
              cmpw cr7, r26, r25
              mfcr r26, 1
              rlwinm r26, r26, 31, 31, 31
              add r25, r8, r27
              cmpw cr7, r25, r7
              mfcr r25, 1
              rlwinm r25, r25, 29, 31, 31
              and. r26, r26, r25
              bne .LBB_test_1 ; loopentry
      
      Next up, improve the code for the loop.
      
      llvm-svn: 22769
      8447b495
    • Chris Lattner's avatar
      Fix a FIXME: if we are inserting code for a PHI argument, split the critical · 4fec86d3
      Chris Lattner authored
      edge so that the code is not always executed for both operands.  This
      prevents LSR from inserting code into loops whose exit blocks contain
      PHI uses of IV expressions (which are outside of loops).  On gzip, for
      example, we turn this ugly code:
      
      .LBB_test_1:    ; loopentry
              add r27, r3, r28
              lhz r27, 3(r27)
              add r26, r4, r28
              lhz r26, 3(r26)
              add r25, r30, r28    ;; Only live if exiting the loop
              add r24, r29, r28    ;; Only live if exiting the loop
              cmpw cr0, r27, r26
              bne .LBB_test_5 ; loopexit
      
      into this:
      
      .LBB_test_1:    ; loopentry
              or r27, r28, r28
              add r28, r3, r27
              lhz r28, 3(r28)
              add r26, r4, r27
              lhz r26, 3(r26)
              cmpw cr0, r28, r26
              beq .LBB_test_3 ; shortcirc_next.0
      .LBB_test_2:    ; loopentry.loopexit_crit_edge
              add r2, r30, r27
              add r8, r29, r27
              b .LBB_test_9   ; loopexit
      .LBB_test_2:    ; shortcirc_next.0
              ...
              blt .LBB_test_1
      
      
      into this:
      
      .LBB_test_1:    ; loopentry
              or r27, r28, r28
              add r28, r3, r27
              lhz r28, 3(r28)
              add r26, r4, r27
              lhz r26, 3(r26)
              cmpw cr0, r28, r26
              beq .LBB_test_3 ; shortcirc_next.0
      .LBB_test_2:    ; loopentry.loopexit_crit_edge
              add r2, r30, r27
              add r8, r29, r27
              b .LBB_t_3:    ; shortcirc_next.0
      .LBB_test_3:    ; shortcirc_next.0
              ...
              blt .LBB_test_1
      
      
      Next step: get the block out of the loop so that the loop is all
      fall-throughs again.
      
      llvm-svn: 22766
      4fec86d3
  17. Aug 10, 2005
    • Chris Lattner's avatar
      Teach LSR to strength reduce IVs that have a loop-invariant but non-constant stride. · edff91a4
      Chris Lattner authored
      For code like this:
      
      void foo(float *a, float *b, int n, int stride_a, int stride_b) {
        int i;
        for (i=0; i<n; i++)
            a[i*stride_a] = b[i*stride_b];
      }
      
      we now emit:
      
      .LBB_foo2_2:    ; no_exit
              lfs f0, 0(r4)
              stfs f0, 0(r3)
              addi r7, r7, 1
              add r4, r2, r4
              add r3, r6, r3
              cmpw cr0, r7, r5
              blt .LBB_foo2_2 ; no_exit
      
      instead of:
      
      .LBB_foo_2:     ; no_exit
              mullw r8, r2, r7     ;; multiply!
              slwi r8, r8, 2
              lfsx f0, r4, r8
              mullw r8, r2, r6     ;; multiply!
              slwi r8, r8, 2
              stfsx f0, r3, r8
              addi r2, r2, 1
              cmpw cr0, r2, r5
              blt .LBB_foo_2  ; no_exit
      
      loops with variable strides occur pretty often.  For example, in SPECFP2K
      there are 317 variable strides in 177.mesa, 3 in 179.art, 14 in 188.ammp,
      56 in 168.wupwise, 36 in 172.mgrid.
      
      Now we can allow indvars to turn functions written like this:
      
      void foo2(float *a, float *b, int n, int stride_a, int stride_b) {
        int i, ai = 0, bi = 0;
        for (i=0; i<n; i++)
          {
            a[ai] = b[bi];
            ai += stride_a;
            bi += stride_b;
          }
      }
      
      into code like the above for better analysis.  With this patch, they generate
      identical code.
      
      llvm-svn: 22740
      edff91a4
    • Chris Lattner's avatar
      Fix Regression/Transforms/LoopStrengthReduce/phi_node_update_multiple_preds.ll · dde7dc52
      Chris Lattner authored
      by being more careful about updating PHI nodes
      
      llvm-svn: 22739
      dde7dc52
    • Chris Lattner's avatar
      Fix some 80 column violations. · c6c4d99a
      Chris Lattner authored
      Once we compute the evolution for a GEP, tell SE about it.  This allows users
      of the GEP to know it, if the users are not direct.  This allows us to compile
      this testcase:
      
      void fbSolidFillmmx(int w, unsigned char *d) {
          while (w >= 64) {
              *(unsigned long long *) (d +  0) = 0;
              *(unsigned long long *) (d +  8) = 0;
              *(unsigned long long *) (d + 16) = 0;
              *(unsigned long long *) (d + 24) = 0;
              *(unsigned long long *) (d + 32) = 0;
              *(unsigned long long *) (d + 40) = 0;
              *(unsigned long long *) (d + 48) = 0;
              *(unsigned long long *) (d + 56) = 0;
              w -= 64;
              d += 64;
          }
      }
      
      into:
      
      .LBB_fbSolidFillmmx_2:  ; no_exit
              li r2, 0
              stw r2, 0(r4)
              stw r2, 4(r4)
              stw r2, 8(r4)
              stw r2, 12(r4)
              stw r2, 16(r4)
              stw r2, 20(r4)
              stw r2, 24(r4)
              stw r2, 28(r4)
              stw r2, 32(r4)
              stw r2, 36(r4)
              stw r2, 40(r4)
              stw r2, 44(r4)
              stw r2, 48(r4)
              stw r2, 52(r4)
              stw r2, 56(r4)
              stw r2, 60(r4)
              addi r4, r4, 64
              addi r3, r3, -64
              cmpwi cr0, r3, 63
              bgt .LBB_fbSolidFillmmx_2       ; no_exit
      
      instead of:
      
      .LBB_fbSolidFillmmx_2:  ; no_exit
              li r11, 0
              stw r11, 0(r4)
              stw r11, 4(r4)
              stwx r11, r10, r4
              add r12, r10, r4
              stw r11, 4(r12)
              stwx r11, r9, r4
              add r12, r9, r4
              stw r11, 4(r12)
              stwx r11, r8, r4
              add r12, r8, r4
              stw r11, 4(r12)
              stwx r11, r7, r4
              add r12, r7, r4
              stw r11, 4(r12)
              stwx r11, r6, r4
              add r12, r6, r4
              stw r11, 4(r12)
              stwx r11, r5, r4
              add r12, r5, r4
              stw r11, 4(r12)
              stwx r11, r2, r4
              add r12, r2, r4
              stw r11, 4(r12)
              addi r4, r4, 64
              addi r3, r3, -64
              cmpwi cr0, r3, 63
              bgt .LBB_fbSolidFillmmx_2       ; no_exit
      
      llvm-svn: 22737
      c6c4d99a
  18. Aug 09, 2005
    • Chris Lattner's avatar
      SCEVAddExpr::get() of an empty list is invalid. · 02742710
      Chris Lattner authored
      llvm-svn: 22724
      02742710
    • Chris Lattner's avatar
      Implement: LoopStrengthReduce/share_ivs.ll · a091ff17
      Chris Lattner authored
      Two changes:
        * Only insert one PHI node for each stride.  Other values are live in
          values.  This cannot introduce higher register pressure than the
          previous approach, and can take advantage of reg+reg addressing modes.
        * Factor common base values out of uses before moving values from the
          base to the immediate fields.  This improves codegen by starting the
          stride-specific PHI node out at a common place for each IV use.
      
      As an example, we used to generate this for a loop in swim:
      
      .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2:        ; no_exit.7.i
              lfd f0, 0(r8)
              stfd f0, 0(r3)
              lfd f0, 0(r6)
              stfd f0, 0(r7)
              lfd f0, 0(r2)
              stfd f0, 0(r5)
              addi r9, r9, 1
              addi r2, r2, 8
              addi r5, r5, 8
              addi r6, r6, 8
              addi r7, r7, 8
              addi r8, r8, 8
              addi r3, r3, 8
              cmpw cr0, r9, r4
              bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1
      
      now we emit:
      
      .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2:        ; no_exit.7.i
              lfdx f0, r8, r2
              stfdx f0, r9, r2
              lfdx f0, r5, r2
              stfdx f0, r7, r2
              lfdx f0, r3, r2
              stfdx f0, r6, r2
              addi r10, r10, 1
              addi r2, r2, 8
              cmpw cr0, r10, r4
              bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1
      
      As another more dramatic example, we used to emit this:
      
      .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2:       ; no_exit.1.i19
              lfd f0, 8(r21)
              lfd f4, 8(r3)
              lfd f5, 8(r27)
              lfd f6, 8(r22)
              lfd f7, 8(r5)
              lfd f8, 8(r6)
              lfd f9, 8(r30)
              lfd f10, 8(r11)
              lfd f11, 8(r12)
              fsub f10, f10, f11
              fadd f5, f4, f5
              fmul f5, f5, f1
              fadd f6, f6, f7
              fadd f6, f6, f8
              fadd f6, f6, f9
              fmadd f0, f5, f6, f0
              fnmsub f0, f10, f2, f0
              stfd f0, 8(r4)
              lfd f0, 8(r25)
              lfd f5, 8(r26)
              lfd f6, 8(r23)
              lfd f9, 8(r28)
              lfd f10, 8(r10)
              lfd f12, 8(r9)
              lfd f13, 8(r29)
              fsub f11, f13, f11
              fadd f4, f4, f5
              fmul f4, f4, f1
              fadd f5, f6, f9
              fadd f5, f5, f10
              fadd f5, f5, f12
              fnmsub f0, f4, f5, f0
              fnmsub f0, f11, f3, f0
              stfd f0, 8(r24)
              lfd f0, 8(r8)
              fsub f4, f7, f8
              fsub f5, f12, f10
              fnmsub f0, f5, f2, f0
              fnmsub f0, f4, f3, f0
              stfd f0, 8(r2)
              addi r20, r20, 1
              addi r2, r2, 8
              addi r8, r8, 8
              addi r10, r10, 8
              addi r12, r12, 8
              addi r6, r6, 8
              addi r29, r29, 8
              addi r28, r28, 8
              addi r26, r26, 8
              addi r25, r25, 8
              addi r24, r24, 8
              addi r5, r5, 8
              addi r23, r23, 8
              addi r22, r22, 8
              addi r3, r3, 8
              addi r9, r9, 8
              addi r11, r11, 8
              addi r30, r30, 8
              addi r27, r27, 8
              addi r21, r21, 8
              addi r4, r4, 8
              cmpw cr0, r20, r7
              bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1
      
      we now emit:
      
      .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2:       ; no_exit.1.i19
              lfdx f0, r21, r20
              lfdx f4, r3, r20
              lfdx f5, r27, r20
              lfdx f6, r22, r20
              lfdx f7, r5, r20
              lfdx f8, r6, r20
              lfdx f9, r30, r20
              lfdx f10, r11, r20
              lfdx f11, r12, r20
              fsub f10, f10, f11
              fadd f5, f4, f5
              fmul f5, f5, f1
              fadd f6, f6, f7
              fadd f6, f6, f8
              fadd f6, f6, f9
              fmadd f0, f5, f6, f0
              fnmsub f0, f10, f2, f0
              stfdx f0, r4, r20
              lfdx f0, r25, r20
              lfdx f5, r26, r20
              lfdx f6, r23, r20
              lfdx f9, r28, r20
              lfdx f10, r10, r20
              lfdx f12, r9, r20
              lfdx f13, r29, r20
              fsub f11, f13, f11
              fadd f4, f4, f5
              fmul f4, f4, f1
              fadd f5, f6, f9
              fadd f5, f5, f10
              fadd f5, f5, f12
              fnmsub f0, f4, f5, f0
              fnmsub f0, f11, f3, f0
              stfdx f0, r24, r20
              lfdx f0, r8, r20
              fsub f4, f7, f8
              fsub f5, f12, f10
              fnmsub f0, f5, f2, f0
              fnmsub f0, f4, f3, f0
              stfdx f0, r2, r20
              addi r19, r19, 1
              addi r20, r20, 8
              cmpw cr0, r19, r7
              bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1
      
      llvm-svn: 22722
      a091ff17
    • Chris Lattner's avatar
      Suck the base value out of the UsersToProcess vector into the BasedUser · 37c24cc9
      Chris Lattner authored
      class to simplify the code.  Fuse two loops.
      
      llvm-svn: 22721
      37c24cc9
    • Chris Lattner's avatar
      Split MoveLoopVariantsToImediateField out from MoveImmediateValues. The · 37ed895b
      Chris Lattner authored
      first is a correctness thing, and the later is an optzn thing.  This also
      is needed to support a future change.
      
      llvm-svn: 22720
      37ed895b
  19. Aug 08, 2005
  20. Aug 05, 2005
Loading