Skip to content
  1. Aug 08, 2005
    • Chris Lattner's avatar
      Implement LoopStrengthReduce/share_code_in_preheader.ll by having one · c70bbc0c
      Chris Lattner authored
      rewriter for all code inserted into the preheader, which is never flushed.
      
      llvm-svn: 22702
      c70bbc0c
    • Chris Lattner's avatar
      Implement a simple optimization for the termination condition of the loop. · 9bfa6f87
      Chris Lattner authored
      The termination condition actually wants to use the post-incremented value
      of the loop, not a new indvar with an unusual base.
      
      On PPC, for example, this allows us to compile
      LoopStrengthReduce/exit_compare_live_range.ll to:
      
      _foo:
              li r2, 0
      .LBB_foo_1:     ; no_exit
              li r5, 0
              stw r5, 0(r3)
              addi r2, r2, 1
              cmpw cr0, r2, r4
              bne .LBB_foo_1  ; no_exit
              blr
      
      instead of:
      
      _foo:
              li r2, 1                ;; IV starts at 1, not 0
      .LBB_foo_1:     ; no_exit
              li r5, 0
              stw r5, 0(r3)
              addi r5, r2, 1
              cmpw cr0, r2, r4
              or r2, r5, r5           ;; Reg-reg copy, extra live range
              bne .LBB_foo_1  ; no_exit
              blr
      
      This implements LoopStrengthReduce/exit_compare_live_range.ll
      
      llvm-svn: 22699
      9bfa6f87
  2. Aug 05, 2005
  3. Aug 04, 2005
    • Chris Lattner's avatar
      * Refactor some code into a new BasedUser::RewriteInstructionToUseNewBase · a6d7c355
      Chris Lattner authored
        method.
      * Fix a crash on 178.galgel, where we would insert expressions before PHI
        nodes instead of into the PHI node predecessor blocks.
      
      llvm-svn: 22657
      a6d7c355
    • Chris Lattner's avatar
      Fix a case that caused this to crash on 178.galgel · 0f7c0fa2
      Chris Lattner authored
      llvm-svn: 22653
      0f7c0fa2
    • Chris Lattner's avatar
      Teach LSR about loop-variant expressions, such as loops like this: · acc42c4d
      Chris Lattner authored
        for (i = 0; i < N; ++i)
          A[i][foo()] = 0;
      
      here we still want to strength reduce the A[i] part, even though foo() is
      l-v.
      
      This also simplifies some of the 'CanReduce' logic.
      
      This implements Transforms/LoopStrengthReduce/ops_after_indvar.ll
      
      llvm-svn: 22652
      acc42c4d
    • Nate Begeman's avatar
      Remove some more dead code. · 456044b7
      Nate Begeman authored
      llvm-svn: 22650
      456044b7
    • Chris Lattner's avatar
      Refactor this code substantially with the following improvements: · eaf24725
      Chris Lattner authored
        1. We only analyze instructions once, guaranteed
        2. AnalyzeGetElementPtrUsers has been ripped apart and replaced with
           something much simpler.
      
      The next step is to handle expressions that are not all indvar+loop-invariant
      values (e.g. handling indvar+loopvariant).
      
      llvm-svn: 22649
      eaf24725
    • Chris Lattner's avatar
      refactor some code · 6f286b76
      Chris Lattner authored
      llvm-svn: 22643
      6f286b76
    • Chris Lattner's avatar
      invert to if's to make the logic simpler · 65107490
      Chris Lattner authored
      llvm-svn: 22641
      65107490
    • Chris Lattner's avatar
      When processing outer loops and we find uses of an IV in inner loops, make · a0102fbc
      Chris Lattner authored
      sure to handle the use, just don't recurse into it.
      
      This permits us to generate this code for a simple nested loop case:
      
      .LBB_foo_0:     ; entry
              stwu r1, -48(r1)
              stw r29, 44(r1)
              stw r30, 40(r1)
              mflr r11
              stw r11, 56(r1)
              lis r2, ha16(L_A$non_lazy_ptr)
              lwz r30, lo16(L_A$non_lazy_ptr)(r2)
              li r29, 1
      .LBB_foo_1:     ; no_exit.0
              bl L_bar$stub
              li r2, 1
              or r3, r30, r30
      .LBB_foo_2:     ; no_exit.1
              lfd f0, 8(r3)
              stfd f0, 0(r3)
              addi r4, r2, 1
              addi r3, r3, 8
              cmpwi cr0, r2, 100
              or r2, r4, r4
              bne .LBB_foo_2  ; no_exit.1
      .LBB_foo_3:     ; loopexit.1
              addi r30, r30, 800
              addi r2, r29, 1
              cmpwi cr0, r29, 100
              or r29, r2, r2
              bne .LBB_foo_1  ; no_exit.0
      .LBB_foo_4:     ; return
              lwz r11, 56(r1)
              mtlr r11
              lwz r30, 40(r1)
              lwz r29, 44(r1)
              lwz r1, 0(r1)
              blr
      
      instead of this:
      
      _foo:
      .LBB_foo_0:     ; entry
              stwu r1, -48(r1)
              stw r28, 44(r1)                   ;; uses an extra register.
              stw r29, 40(r1)
              stw r30, 36(r1)
              mflr r11
              stw r11, 56(r1)
              li r30, 1
              li r29, 0
              or r28, r29, r29
      .LBB_foo_1:     ; no_exit.0
              bl L_bar$stub
              mulli r2, r28, 800           ;; unstrength-reduced multiply
              lis r3, ha16(L_A$non_lazy_ptr)   ;; loop invariant address computation
              lwz r3, lo16(L_A$non_lazy_ptr)(r3)
              add r2, r2, r3
              mulli r4, r29, 800           ;; unstrength-reduced multiply
              addi r3, r3, 8
              add r3, r4, r3
              li r4, 1
      .LBB_foo_2:     ; no_exit.1
              lfd f0, 0(r3)
              stfd f0, 0(r2)
              addi r5, r4, 1
              addi r2, r2, 8                 ;; multiple stride 8 IV's
              addi r3, r3, 8
              cmpwi cr0, r4, 100
              or r4, r5, r5
              bne .LBB_foo_2  ; no_exit.1
      .LBB_foo_3:     ; loopexit.1
              addi r28, r28, 1               ;;; Many IV's with stride 1
              addi r29, r29, 1
              addi r2, r30, 1
              cmpwi cr0, r30, 100
              or r30, r2, r2
              bne .LBB_foo_1  ; no_exit.0
      .LBB_foo_4:     ; return
              lwz r11, 56(r1)
              mtlr r11
              lwz r30, 36(r1)
              lwz r29, 40(r1)
              lwz r28, 44(r1)
              lwz r1, 0(r1)
              blr
      
      llvm-svn: 22640
      a0102fbc
    • Chris Lattner's avatar
      Teach loop-reduce to see into nested loops, to pull out immediate values · fc624704
      Chris Lattner authored
      pushed down by SCEV.
      
      In a nested loop case, this allows us to emit this:
      
              lis r3, ha16(L_A$non_lazy_ptr)
              lwz r3, lo16(L_A$non_lazy_ptr)(r3)
              add r2, r2, r3
              li r3, 1
      .LBB_foo_2:     ; no_exit.1
              lfd f0, 8(r2)        ;; Uses offset of 8 instead of 0
              stfd f0, 0(r2)
              addi r4, r3, 1
              addi r2, r2, 8
              cmpwi cr0, r3, 100
              or r3, r4, r4
              bne .LBB_foo_2  ; no_exit.1
      
      instead of this:
      
              lis r3, ha16(L_A$non_lazy_ptr)
              lwz r3, lo16(L_A$non_lazy_ptr)(r3)
              add r2, r2, r3
              addi r3, r3, 8
              li r4, 1
      .LBB_foo_2:     ; no_exit.1
              lfd f0, 0(r3)
              stfd f0, 0(r2)
              addi r5, r4, 1
              addi r2, r2, 8
              addi r3, r3, 8
              cmpwi cr0, r4, 100
              or r4, r5, r5
              bne .LBB_foo_2  ; no_exit.1
      
      llvm-svn: 22639
      fc624704
    • Chris Lattner's avatar
      improve debug output · bb78c97e
      Chris Lattner authored
      llvm-svn: 22638
      bb78c97e
    • Chris Lattner's avatar
      Move from Stage 0 to Stage 1. · db23c74e
      Chris Lattner authored
      Only emit one PHI node for IV uses with identical bases and strides (after
      moving foldable immediates to the load/store instruction).
      
      This implements LoopStrengthReduce/dont_insert_redundant_ops.ll, allowing
      us to generate this PPC code for test1:
      
              or r30, r3, r3
      .LBB_test1_1:   ; Loop
              li r2, 0
              stw r2, 0(r30)
              stw r2, 4(r30)
              bl L_pred$stub
              addi r30, r30, 8
              cmplwi cr0, r3, 0
              bne .LBB_test1_1        ; Loop
      
      instead of this code:
      
              or r30, r3, r3
              or r29, r3, r3
      .LBB_test1_1:   ; Loop
              li r2, 0
              stw r2, 0(r29)
              stw r2, 4(r30)
              bl L_pred$stub
              addi r30, r30, 8        ;; Two iv's with step of 8
              addi r29, r29, 8
              cmplwi cr0, r3, 0
              bne .LBB_test1_1        ; Loop
      
      llvm-svn: 22635
      db23c74e
    • Chris Lattner's avatar
      Rename IVUse to IVUsersOfOneStride, use a struct instead of a pair to · 430d0022
      Chris Lattner authored
      unify some parallel vectors and get field names more descriptive than
      "first" and "second".  This isn't lisp afterall :)
      
      llvm-svn: 22633
      430d0022
  4. Aug 03, 2005
  5. Aug 02, 2005
  6. Jul 30, 2005
  7. Apr 22, 2005
  8. Mar 06, 2005
  9. Mar 05, 2005
  10. Mar 04, 2005
  11. Mar 01, 2005
    • Jeff Cohen's avatar
      Fixed the following LSR bugs: · 8ea6f9e8
      Jeff Cohen authored
        * Loop invariant code does not dominate the loop header, but rather
          the end of the loop preheader.
      
        * The base for a reduced GEP isn't a constant unless all of its
          operands (preceding the induction variable) are constant.
      
        * Allow induction variable elimination for the simple case after all.
      
      Also made changes recommended by Chris for properly deleting
      instructions.
      
      llvm-svn: 20383
      8ea6f9e8
  12. Feb 28, 2005
  13. Feb 27, 2005
  14. Oct 18, 2004
    • Nate Begeman's avatar
      Initial implementation of the strength reduction for GEP instructions in · b18121e6
      Nate Begeman authored
      loops.  This optimization is not turned on by default yet, but may be run
      with the opt tool's -loop-reduce flag.  There are many FIXMEs listed in the
      code that will make it far more applicable to a wide range of code, but you
      have to start somewhere :)
      
      This limited version currently triggers on the following tests in the
      MultiSource directory:
      pcompress2: 7 times
      cfrac: 5 times
      anagram: 2 times
      ks: 6 times
      yacr2: 2 times
      
      llvm-svn: 17134
      b18121e6
Loading