Skip to content
  1. Aug 12, 2005
  2. Aug 10, 2005
    • Chris Lattner's avatar
      62df7989
    • Chris Lattner's avatar
      Make loop-simplify produce better loops by turning PHI nodes like X = phi [X, Y] · f83ce5fa
      Chris Lattner authored
      into just Y.  This often occurs when it seperates loops that have collapsed loop
      headers.  This implements LoopSimplify/phi-node-simplify.ll
      
      llvm-svn: 22746
      f83ce5fa
    • Chris Lattner's avatar
      Allow indvar simplify to canonicalize ANY affine IV, not just affine IVs with · 677d8578
      Chris Lattner authored
      constant stride.  This implements Transforms/IndVarsSimplify/variable-stride-ivs.ll
      
      llvm-svn: 22744
      677d8578
    • Chris Lattner's avatar
      Teach LSR to strength reduce IVs that have a loop-invariant but non-constant stride. · edff91a4
      Chris Lattner authored
      For code like this:
      
      void foo(float *a, float *b, int n, int stride_a, int stride_b) {
        int i;
        for (i=0; i<n; i++)
            a[i*stride_a] = b[i*stride_b];
      }
      
      we now emit:
      
      .LBB_foo2_2:    ; no_exit
              lfs f0, 0(r4)
              stfs f0, 0(r3)
              addi r7, r7, 1
              add r4, r2, r4
              add r3, r6, r3
              cmpw cr0, r7, r5
              blt .LBB_foo2_2 ; no_exit
      
      instead of:
      
      .LBB_foo_2:     ; no_exit
              mullw r8, r2, r7     ;; multiply!
              slwi r8, r8, 2
              lfsx f0, r4, r8
              mullw r8, r2, r6     ;; multiply!
              slwi r8, r8, 2
              stfsx f0, r3, r8
              addi r2, r2, 1
              cmpw cr0, r2, r5
              blt .LBB_foo_2  ; no_exit
      
      loops with variable strides occur pretty often.  For example, in SPECFP2K
      there are 317 variable strides in 177.mesa, 3 in 179.art, 14 in 188.ammp,
      56 in 168.wupwise, 36 in 172.mgrid.
      
      Now we can allow indvars to turn functions written like this:
      
      void foo2(float *a, float *b, int n, int stride_a, int stride_b) {
        int i, ai = 0, bi = 0;
        for (i=0; i<n; i++)
          {
            a[ai] = b[bi];
            ai += stride_a;
            bi += stride_b;
          }
      }
      
      into code like the above for better analysis.  With this patch, they generate
      identical code.
      
      llvm-svn: 22740
      edff91a4
    • Chris Lattner's avatar
      Fix Regression/Transforms/LoopStrengthReduce/phi_node_update_multiple_preds.ll · dde7dc52
      Chris Lattner authored
      by being more careful about updating PHI nodes
      
      llvm-svn: 22739
      dde7dc52
    • Chris Lattner's avatar
      Fix some 80 column violations. · c6c4d99a
      Chris Lattner authored
      Once we compute the evolution for a GEP, tell SE about it.  This allows users
      of the GEP to know it, if the users are not direct.  This allows us to compile
      this testcase:
      
      void fbSolidFillmmx(int w, unsigned char *d) {
          while (w >= 64) {
              *(unsigned long long *) (d +  0) = 0;
              *(unsigned long long *) (d +  8) = 0;
              *(unsigned long long *) (d + 16) = 0;
              *(unsigned long long *) (d + 24) = 0;
              *(unsigned long long *) (d + 32) = 0;
              *(unsigned long long *) (d + 40) = 0;
              *(unsigned long long *) (d + 48) = 0;
              *(unsigned long long *) (d + 56) = 0;
              w -= 64;
              d += 64;
          }
      }
      
      into:
      
      .LBB_fbSolidFillmmx_2:  ; no_exit
              li r2, 0
              stw r2, 0(r4)
              stw r2, 4(r4)
              stw r2, 8(r4)
              stw r2, 12(r4)
              stw r2, 16(r4)
              stw r2, 20(r4)
              stw r2, 24(r4)
              stw r2, 28(r4)
              stw r2, 32(r4)
              stw r2, 36(r4)
              stw r2, 40(r4)
              stw r2, 44(r4)
              stw r2, 48(r4)
              stw r2, 52(r4)
              stw r2, 56(r4)
              stw r2, 60(r4)
              addi r4, r4, 64
              addi r3, r3, -64
              cmpwi cr0, r3, 63
              bgt .LBB_fbSolidFillmmx_2       ; no_exit
      
      instead of:
      
      .LBB_fbSolidFillmmx_2:  ; no_exit
              li r11, 0
              stw r11, 0(r4)
              stw r11, 4(r4)
              stwx r11, r10, r4
              add r12, r10, r4
              stw r11, 4(r12)
              stwx r11, r9, r4
              add r12, r9, r4
              stw r11, 4(r12)
              stwx r11, r8, r4
              add r12, r8, r4
              stw r11, 4(r12)
              stwx r11, r7, r4
              add r12, r7, r4
              stw r11, 4(r12)
              stwx r11, r6, r4
              add r12, r6, r4
              stw r11, 4(r12)
              stwx r11, r5, r4
              add r12, r5, r4
              stw r11, 4(r12)
              stwx r11, r2, r4
              add r12, r2, r4
              stw r11, 4(r12)
              addi r4, r4, 64
              addi r3, r3, -64
              cmpwi cr0, r3, 63
              bgt .LBB_fbSolidFillmmx_2       ; no_exit
      
      llvm-svn: 22737
      c6c4d99a
  3. Aug 09, 2005
    • Chris Lattner's avatar
      SCEVAddExpr::get() of an empty list is invalid. · 02742710
      Chris Lattner authored
      llvm-svn: 22724
      02742710
    • Chris Lattner's avatar
      Implement: LoopStrengthReduce/share_ivs.ll · a091ff17
      Chris Lattner authored
      Two changes:
        * Only insert one PHI node for each stride.  Other values are live in
          values.  This cannot introduce higher register pressure than the
          previous approach, and can take advantage of reg+reg addressing modes.
        * Factor common base values out of uses before moving values from the
          base to the immediate fields.  This improves codegen by starting the
          stride-specific PHI node out at a common place for each IV use.
      
      As an example, we used to generate this for a loop in swim:
      
      .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2:        ; no_exit.7.i
              lfd f0, 0(r8)
              stfd f0, 0(r3)
              lfd f0, 0(r6)
              stfd f0, 0(r7)
              lfd f0, 0(r2)
              stfd f0, 0(r5)
              addi r9, r9, 1
              addi r2, r2, 8
              addi r5, r5, 8
              addi r6, r6, 8
              addi r7, r7, 8
              addi r8, r8, 8
              addi r3, r3, 8
              cmpw cr0, r9, r4
              bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1
      
      now we emit:
      
      .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2:        ; no_exit.7.i
              lfdx f0, r8, r2
              stfdx f0, r9, r2
              lfdx f0, r5, r2
              stfdx f0, r7, r2
              lfdx f0, r3, r2
              stfdx f0, r6, r2
              addi r10, r10, 1
              addi r2, r2, 8
              cmpw cr0, r10, r4
              bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1
      
      As another more dramatic example, we used to emit this:
      
      .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2:       ; no_exit.1.i19
              lfd f0, 8(r21)
              lfd f4, 8(r3)
              lfd f5, 8(r27)
              lfd f6, 8(r22)
              lfd f7, 8(r5)
              lfd f8, 8(r6)
              lfd f9, 8(r30)
              lfd f10, 8(r11)
              lfd f11, 8(r12)
              fsub f10, f10, f11
              fadd f5, f4, f5
              fmul f5, f5, f1
              fadd f6, f6, f7
              fadd f6, f6, f8
              fadd f6, f6, f9
              fmadd f0, f5, f6, f0
              fnmsub f0, f10, f2, f0
              stfd f0, 8(r4)
              lfd f0, 8(r25)
              lfd f5, 8(r26)
              lfd f6, 8(r23)
              lfd f9, 8(r28)
              lfd f10, 8(r10)
              lfd f12, 8(r9)
              lfd f13, 8(r29)
              fsub f11, f13, f11
              fadd f4, f4, f5
              fmul f4, f4, f1
              fadd f5, f6, f9
              fadd f5, f5, f10
              fadd f5, f5, f12
              fnmsub f0, f4, f5, f0
              fnmsub f0, f11, f3, f0
              stfd f0, 8(r24)
              lfd f0, 8(r8)
              fsub f4, f7, f8
              fsub f5, f12, f10
              fnmsub f0, f5, f2, f0
              fnmsub f0, f4, f3, f0
              stfd f0, 8(r2)
              addi r20, r20, 1
              addi r2, r2, 8
              addi r8, r8, 8
              addi r10, r10, 8
              addi r12, r12, 8
              addi r6, r6, 8
              addi r29, r29, 8
              addi r28, r28, 8
              addi r26, r26, 8
              addi r25, r25, 8
              addi r24, r24, 8
              addi r5, r5, 8
              addi r23, r23, 8
              addi r22, r22, 8
              addi r3, r3, 8
              addi r9, r9, 8
              addi r11, r11, 8
              addi r30, r30, 8
              addi r27, r27, 8
              addi r21, r21, 8
              addi r4, r4, 8
              cmpw cr0, r20, r7
              bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1
      
      we now emit:
      
      .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2:       ; no_exit.1.i19
              lfdx f0, r21, r20
              lfdx f4, r3, r20
              lfdx f5, r27, r20
              lfdx f6, r22, r20
              lfdx f7, r5, r20
              lfdx f8, r6, r20
              lfdx f9, r30, r20
              lfdx f10, r11, r20
              lfdx f11, r12, r20
              fsub f10, f10, f11
              fadd f5, f4, f5
              fmul f5, f5, f1
              fadd f6, f6, f7
              fadd f6, f6, f8
              fadd f6, f6, f9
              fmadd f0, f5, f6, f0
              fnmsub f0, f10, f2, f0
              stfdx f0, r4, r20
              lfdx f0, r25, r20
              lfdx f5, r26, r20
              lfdx f6, r23, r20
              lfdx f9, r28, r20
              lfdx f10, r10, r20
              lfdx f12, r9, r20
              lfdx f13, r29, r20
              fsub f11, f13, f11
              fadd f4, f4, f5
              fmul f4, f4, f1
              fadd f5, f6, f9
              fadd f5, f5, f10
              fadd f5, f5, f12
              fnmsub f0, f4, f5, f0
              fnmsub f0, f11, f3, f0
              stfdx f0, r24, r20
              lfdx f0, r8, r20
              fsub f4, f7, f8
              fsub f5, f12, f10
              fnmsub f0, f5, f2, f0
              fnmsub f0, f4, f3, f0
              stfdx f0, r2, r20
              addi r19, r19, 1
              addi r20, r20, 8
              cmpw cr0, r19, r7
              bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1
      
      llvm-svn: 22722
      a091ff17
    • Chris Lattner's avatar
      Suck the base value out of the UsersToProcess vector into the BasedUser · 37c24cc9
      Chris Lattner authored
      class to simplify the code.  Fuse two loops.
      
      llvm-svn: 22721
      37c24cc9
    • Chris Lattner's avatar
      Split MoveLoopVariantsToImediateField out from MoveImmediateValues. The · 37ed895b
      Chris Lattner authored
      first is a correctness thing, and the later is an optzn thing.  This also
      is needed to support a future change.
      
      llvm-svn: 22720
      37ed895b
  4. Aug 08, 2005
  5. Aug 07, 2005
  6. Aug 05, 2005
  7. Aug 04, 2005
    • Chris Lattner's avatar
      * Refactor some code into a new BasedUser::RewriteInstructionToUseNewBase · a6d7c355
      Chris Lattner authored
        method.
      * Fix a crash on 178.galgel, where we would insert expressions before PHI
        nodes instead of into the PHI node predecessor blocks.
      
      llvm-svn: 22657
      a6d7c355
    • Chris Lattner's avatar
      Fix a case that caused this to crash on 178.galgel · 0f7c0fa2
      Chris Lattner authored
      llvm-svn: 22653
      0f7c0fa2
    • Chris Lattner's avatar
      Teach LSR about loop-variant expressions, such as loops like this: · acc42c4d
      Chris Lattner authored
        for (i = 0; i < N; ++i)
          A[i][foo()] = 0;
      
      here we still want to strength reduce the A[i] part, even though foo() is
      l-v.
      
      This also simplifies some of the 'CanReduce' logic.
      
      This implements Transforms/LoopStrengthReduce/ops_after_indvar.ll
      
      llvm-svn: 22652
      acc42c4d
    • Nate Begeman's avatar
      Remove some more dead code. · 456044b7
      Nate Begeman authored
      llvm-svn: 22650
      456044b7
    • Chris Lattner's avatar
      Refactor this code substantially with the following improvements: · eaf24725
      Chris Lattner authored
        1. We only analyze instructions once, guaranteed
        2. AnalyzeGetElementPtrUsers has been ripped apart and replaced with
           something much simpler.
      
      The next step is to handle expressions that are not all indvar+loop-invariant
      values (e.g. handling indvar+loopvariant).
      
      llvm-svn: 22649
      eaf24725
    • Chris Lattner's avatar
      refactor some code · 6f286b76
      Chris Lattner authored
      llvm-svn: 22643
      6f286b76
    • Chris Lattner's avatar
      invert to if's to make the logic simpler · 65107490
      Chris Lattner authored
      llvm-svn: 22641
      65107490
    • Chris Lattner's avatar
      When processing outer loops and we find uses of an IV in inner loops, make · a0102fbc
      Chris Lattner authored
      sure to handle the use, just don't recurse into it.
      
      This permits us to generate this code for a simple nested loop case:
      
      .LBB_foo_0:     ; entry
              stwu r1, -48(r1)
              stw r29, 44(r1)
              stw r30, 40(r1)
              mflr r11
              stw r11, 56(r1)
              lis r2, ha16(L_A$non_lazy_ptr)
              lwz r30, lo16(L_A$non_lazy_ptr)(r2)
              li r29, 1
      .LBB_foo_1:     ; no_exit.0
              bl L_bar$stub
              li r2, 1
              or r3, r30, r30
      .LBB_foo_2:     ; no_exit.1
              lfd f0, 8(r3)
              stfd f0, 0(r3)
              addi r4, r2, 1
              addi r3, r3, 8
              cmpwi cr0, r2, 100
              or r2, r4, r4
              bne .LBB_foo_2  ; no_exit.1
      .LBB_foo_3:     ; loopexit.1
              addi r30, r30, 800
              addi r2, r29, 1
              cmpwi cr0, r29, 100
              or r29, r2, r2
              bne .LBB_foo_1  ; no_exit.0
      .LBB_foo_4:     ; return
              lwz r11, 56(r1)
              mtlr r11
              lwz r30, 40(r1)
              lwz r29, 44(r1)
              lwz r1, 0(r1)
              blr
      
      instead of this:
      
      _foo:
      .LBB_foo_0:     ; entry
              stwu r1, -48(r1)
              stw r28, 44(r1)                   ;; uses an extra register.
              stw r29, 40(r1)
              stw r30, 36(r1)
              mflr r11
              stw r11, 56(r1)
              li r30, 1
              li r29, 0
              or r28, r29, r29
      .LBB_foo_1:     ; no_exit.0
              bl L_bar$stub
              mulli r2, r28, 800           ;; unstrength-reduced multiply
              lis r3, ha16(L_A$non_lazy_ptr)   ;; loop invariant address computation
              lwz r3, lo16(L_A$non_lazy_ptr)(r3)
              add r2, r2, r3
              mulli r4, r29, 800           ;; unstrength-reduced multiply
              addi r3, r3, 8
              add r3, r4, r3
              li r4, 1
      .LBB_foo_2:     ; no_exit.1
              lfd f0, 0(r3)
              stfd f0, 0(r2)
              addi r5, r4, 1
              addi r2, r2, 8                 ;; multiple stride 8 IV's
              addi r3, r3, 8
              cmpwi cr0, r4, 100
              or r4, r5, r5
              bne .LBB_foo_2  ; no_exit.1
      .LBB_foo_3:     ; loopexit.1
              addi r28, r28, 1               ;;; Many IV's with stride 1
              addi r29, r29, 1
              addi r2, r30, 1
              cmpwi cr0, r30, 100
              or r30, r2, r2
              bne .LBB_foo_1  ; no_exit.0
      .LBB_foo_4:     ; return
              lwz r11, 56(r1)
              mtlr r11
              lwz r30, 36(r1)
              lwz r29, 40(r1)
              lwz r28, 44(r1)
              lwz r1, 0(r1)
              blr
      
      llvm-svn: 22640
      a0102fbc
    • Chris Lattner's avatar
      Teach loop-reduce to see into nested loops, to pull out immediate values · fc624704
      Chris Lattner authored
      pushed down by SCEV.
      
      In a nested loop case, this allows us to emit this:
      
              lis r3, ha16(L_A$non_lazy_ptr)
              lwz r3, lo16(L_A$non_lazy_ptr)(r3)
              add r2, r2, r3
              li r3, 1
      .LBB_foo_2:     ; no_exit.1
              lfd f0, 8(r2)        ;; Uses offset of 8 instead of 0
              stfd f0, 0(r2)
              addi r4, r3, 1
              addi r2, r2, 8
              cmpwi cr0, r3, 100
              or r3, r4, r4
              bne .LBB_foo_2  ; no_exit.1
      
      instead of this:
      
              lis r3, ha16(L_A$non_lazy_ptr)
              lwz r3, lo16(L_A$non_lazy_ptr)(r3)
              add r2, r2, r3
              addi r3, r3, 8
              li r4, 1
      .LBB_foo_2:     ; no_exit.1
              lfd f0, 0(r3)
              stfd f0, 0(r2)
              addi r5, r4, 1
              addi r2, r2, 8
              addi r3, r3, 8
              cmpwi cr0, r4, 100
              or r4, r5, r5
              bne .LBB_foo_2  ; no_exit.1
      
      llvm-svn: 22639
      fc624704
    • Chris Lattner's avatar
      improve debug output · bb78c97e
      Chris Lattner authored
      llvm-svn: 22638
      bb78c97e
    • Chris Lattner's avatar
      Move from Stage 0 to Stage 1. · db23c74e
      Chris Lattner authored
      Only emit one PHI node for IV uses with identical bases and strides (after
      moving foldable immediates to the load/store instruction).
      
      This implements LoopStrengthReduce/dont_insert_redundant_ops.ll, allowing
      us to generate this PPC code for test1:
      
              or r30, r3, r3
      .LBB_test1_1:   ; Loop
              li r2, 0
              stw r2, 0(r30)
              stw r2, 4(r30)
              bl L_pred$stub
              addi r30, r30, 8
              cmplwi cr0, r3, 0
              bne .LBB_test1_1        ; Loop
      
      instead of this code:
      
              or r30, r3, r3
              or r29, r3, r3
      .LBB_test1_1:   ; Loop
              li r2, 0
              stw r2, 0(r29)
              stw r2, 4(r30)
              bl L_pred$stub
              addi r30, r30, 8        ;; Two iv's with step of 8
              addi r29, r29, 8
              cmplwi cr0, r3, 0
              bne .LBB_test1_1        ; Loop
      
      llvm-svn: 22635
      db23c74e
    • Chris Lattner's avatar
      Rename IVUse to IVUsersOfOneStride, use a struct instead of a pair to · 430d0022
      Chris Lattner authored
      unify some parallel vectors and get field names more descriptive than
      "first" and "second".  This isn't lisp afterall :)
      
      llvm-svn: 22633
      430d0022
  8. Aug 03, 2005
Loading