Skip to content
  1. Dec 05, 2005
    • Chris Lattner's avatar
      Fix the #1 code quality problem that I have seen on X86 (and it also affects · 35397788
      Chris Lattner authored
      PPC and other targets).  In a particular, consider code like this:
      
      struct Vector3 { double x, y, z; };
      struct Matrix3 { Vector3 a, b, c; };
      double dot(Vector3 &a, Vector3 &b) {
         return a.x * b.x  +  a.y * b.y  +  a.z * b.z;
      }
      Vector3 mul(Vector3 &a, Matrix3 &b) {
         Vector3 r;
         r.x = dot( a, b.a );
         r.y = dot( a, b.b );
         r.z = dot( a, b.c );
         return r;
      }
      void transform(Matrix3 &m, Vector3 *x, int n) {
         for (int i = 0; i < n; i++)
            x[i] = mul( x[i], m );
      }
      
      we compile transform to a loop with all of the GEP instructions for indexing
      into 'm' pulled out of the loop (9 of them).  Because isel occurs a bb at a time
      we are unable to fold the constant index into the loads in the loop, leading to
      PPC code that looks like this:
      
      LBB3_1: ; no_exit.preheader
              li r2, 0
              addi r6, r3, 64        ;; 9 values live across the loop body!
              addi r7, r3, 56
              addi r8, r3, 48
              addi r9, r3, 40
              addi r10, r3, 32
              addi r11, r3, 24
              addi r12, r3, 16
              addi r30, r3, 8
      LBB3_2: ; no_exit
              lfd f0, 0(r30)
              lfd f1, 8(r4)
              fmul f0, f1, f0
              lfd f2, 0(r3)        ;; no constant indices folded into the loads!
              lfd f3, 0(r4)
              lfd f4, 0(r10)
              lfd f5, 0(r6)
              lfd f6, 0(r7)
              lfd f7, 0(r8)
              lfd f8, 0(r9)
              lfd f9, 0(r11)
              lfd f10, 0(r12)
              lfd f11, 16(r4)
              fmadd f0, f3, f2, f0
              fmul f2, f1, f4
              fmadd f0, f11, f10, f0
              fmadd f2, f3, f9, f2
              fmul f1, f1, f6
              stfd f0, 0(r4)
              fmadd f0, f11, f8, f2
              fmadd f1, f3, f7, f1
              stfd f0, 8(r4)
              fmadd f0, f11, f5, f1
              addi r29, r4, 24
              stfd f0, 16(r4)
              addi r2, r2, 1
              cmpw cr0, r2, r5
              or r4, r29, r29
              bne cr0, LBB3_2 ; no_exit
      
      uh, yuck.  With this patch, we now sink the constant offsets into the loop, producing
      this code:
      
      LBB3_1: ; no_exit.preheader
              li r2, 0
      LBB3_2: ; no_exit
              lfd f0, 8(r3)
              lfd f1, 8(r4)
              fmul f0, f1, f0
              lfd f2, 0(r3)
              lfd f3, 0(r4)
              lfd f4, 32(r3)       ;; much nicer.
              lfd f5, 64(r3)
              lfd f6, 56(r3)
              lfd f7, 48(r3)
              lfd f8, 40(r3)
              lfd f9, 24(r3)
              lfd f10, 16(r3)
              lfd f11, 16(r4)
              fmadd f0, f3, f2, f0
              fmul f2, f1, f4
              fmadd f0, f11, f10, f0
              fmadd f2, f3, f9, f2
              fmul f1, f1, f6
              stfd f0, 0(r4)
              fmadd f0, f11, f8, f2
              fmadd f1, f3, f7, f1
              stfd f0, 8(r4)
              fmadd f0, f11, f5, f1
              addi r6, r4, 24
              stfd f0, 16(r4)
              addi r2, r2, 1
              cmpw cr0, r2, r5
              or r4, r6, r6
              bne cr0, LBB3_2 ; no_exit
      
      This is much nicer as it reduces register pressure in the loop a lot.  On X86,
      this takes the function from having 9 spilled registers to 2.  This should help
      some spec programs on X86 (gzip?)
      
      This is currently only enabled with -enable-gep-isel-opt to allow perf testing
      tonight.
      
      llvm-svn: 24606
      35397788
  2. Dec 03, 2005
  3. Dec 02, 2005
  4. Dec 01, 2005
  5. Nov 30, 2005
  6. Nov 29, 2005
  7. Nov 22, 2005
    • Nate Begeman's avatar
      Check in code to scalarize arbitrarily wide packed types for some simple · d37c1315
      Nate Begeman authored
      vector operations (load, add, sub, mul).
      
      This allows us to codegen:
      void %foo(<4 x float> * %a) {
      entry:
        %tmp1 = load <4 x float> * %a;
        %tmp2 = add <4 x float> %tmp1, %tmp1
        store <4 x float> %tmp2, <4 x float> *%a
        ret void
      }
      
      on ppc as:
      _foo:
              lfs f0, 12(r3)
              lfs f1, 8(r3)
              lfs f2, 4(r3)
              lfs f3, 0(r3)
              fadds f0, f0, f0
              fadds f1, f1, f1
              fadds f2, f2, f2
              fadds f3, f3, f3
              stfs f0, 12(r3)
              stfs f1, 8(r3)
              stfs f2, 4(r3)
              stfs f3, 0(r3)
              blr
      
      llvm-svn: 24484
      d37c1315
    • Nate Begeman's avatar
      Rather than attempting to legalize 1 x float, make sure the SD ISel never · 07890bbe
      Nate Begeman authored
      generates it.  Make MVT::Vector expand-only, and remove the code in
      Legalize that attempts to legalize it.
      
      The plan for supporting N x Type is to continually epxand it in ExpandOp
      until it gets down to 2 x Type, where it will be scalarized into a pair of
      scalars.
      
      llvm-svn: 24482
      07890bbe
  8. Nov 21, 2005
  9. Nov 20, 2005
  10. Nov 19, 2005
  11. Nov 18, 2005
  12. Nov 17, 2005
  13. Nov 16, 2005
  14. Nov 15, 2005
  15. Nov 14, 2005
Loading