Skip to content
  1. May 04, 2006
  2. May 03, 2006
  3. May 02, 2006
  4. Apr 28, 2006
  5. Apr 25, 2006
  6. Apr 22, 2006
  7. Apr 20, 2006
  8. Apr 19, 2006
  9. Apr 18, 2006
    • Chris Lattner's avatar
      These are correctly encoded by the JIT. I checked :) · 34c901b5
      Chris Lattner authored
      llvm-svn: 27810
      34c901b5
    • Chris Lattner's avatar
      add a note · 197d7622
      Chris Lattner authored
      llvm-svn: 27809
      197d7622
    • Chris Lattner's avatar
      Fix a crash on: · 518834c6
      Chris Lattner authored
      void foo2(vector float *A, vector float *B) {
        vector float C = (vector float)vec_cmpeq(*A, *B);
        if (!vec_any_eq(*A, *B))
          *B = (vector float){0,0,0,0};
        *A = C;
      }
      
      llvm-svn: 27808
      518834c6
    • Chris Lattner's avatar
      pretty print node name · 1e174c87
      Chris Lattner authored
      llvm-svn: 27806
      1e174c87
    • Chris Lattner's avatar
      Implement an important entry from README_ALTIVEC: · 9754d142
      Chris Lattner authored
      If an altivec predicate compare is used immediately by a branch, don't
      use a (serializing) MFCR instruction to read the CR6 register, which requires
      a compare to get it back to CR's.  Instead, just branch on CR6 directly. :)
      
      For example, for:
      void foo2(vector float *A, vector float *B) {
        if (!vec_any_eq(*A, *B))
          *B = (vector float){0,0,0,0};
      }
      
      We now generate:
      
      _foo2:
              mfspr r2, 256
              oris r5, r2, 12288
              mtspr 256, r5
              lvx v2, 0, r4
              lvx v3, 0, r3
              vcmpeqfp. v2, v3, v2
              bne cr6, LBB1_2 ; UnifiedReturnBlock
      LBB1_1: ; cond_true
              vxor v2, v2, v2
              stvx v2, 0, r4
              mtspr 256, r2
              blr
      LBB1_2: ; UnifiedReturnBlock
              mtspr 256, r2
              blr
      
      instead of:
      
      _foo2:
              mfspr r2, 256
              oris r5, r2, 12288
              mtspr 256, r5
              lvx v2, 0, r4
              lvx v3, 0, r3
              vcmpeqfp. v2, v3, v2
              mfcr r3, 2
              rlwinm r3, r3, 27, 31, 31
              cmpwi cr0, r3, 0
              beq cr0, LBB1_2 ; UnifiedReturnBlock
      LBB1_1: ; cond_true
              vxor v2, v2, v2
              stvx v2, 0, r4
              mtspr 256, r2
              blr
      LBB1_2: ; UnifiedReturnBlock
              mtspr 256, r2
              blr
      
      This implements CodeGen/PowerPC/vec_br_cmp.ll.
      
      llvm-svn: 27804
      9754d142
    • Chris Lattner's avatar
      move some stuff around, clean things up · 68c16a20
      Chris Lattner authored
      llvm-svn: 27802
      68c16a20
    • Chris Lattner's avatar
      Use vmladduhm to do v8i16 multiplies which is faster and simpler than doing · 96d50487
      Chris Lattner authored
      even/odd halves.  Thanks to Nate telling me what's what.
      
      llvm-svn: 27793
      96d50487
    • Chris Lattner's avatar
      Implement v16i8 multiply with this code: · d6d82aa8
      Chris Lattner authored
              vmuloub v5, v3, v2
              vmuleub v2, v3, v2
              vperm v2, v2, v5, v4
      
      This implements CodeGen/PowerPC/vec_mul.ll.  With this, v16i8 multiplies are
      6.79x faster than before.
      
      Overall, UnitTests/Vector/multiplies.c is now 2.45x faster with LLVM than with
      GCC.
      
      Remove the 'integer multiplies' todo from the README file.
      
      llvm-svn: 27792
      d6d82aa8
    • Chris Lattner's avatar
      Lower v8i16 multiply into this code: · 7e439874
      Chris Lattner authored
              li r5, lo16(LCPI1_0)
              lis r6, ha16(LCPI1_0)
              lvx v4, r6, r5
              vmulouh v5, v3, v2
              vmuleuh v2, v3, v2
              vperm v2, v2, v5, v4
      
      where v4 is:
      LCPI1_0:                                        ;  <16 x ubyte>
              .byte   2
              .byte   3
              .byte   18
              .byte   19
              .byte   6
              .byte   7
              .byte   22
              .byte   23
              .byte   10
              .byte   11
              .byte   26
              .byte   27
              .byte   14
              .byte   15
              .byte   30
              .byte   31
      
      This is 5.07x faster on the G5 (measured) than lowering to scalar code +
      loads/stores.
      
      llvm-svn: 27789
      7e439874
    • Chris Lattner's avatar
      Custom lower v4i32 multiplies into a cute sequence, instead of having legalize · a2cae1bb
      Chris Lattner authored
      scalarize the sequence into 4 mullw's and a bunch of load/store traffic.
      
      This speeds up v4i32 multiplies 4.1x (measured) on a G5.  This implements
      PowerPC/vec_mul.ll
      
      llvm-svn: 27788
      a2cae1bb
  10. Apr 17, 2006
Loading