Skip to content
  1. Mar 07, 2013
  2. Mar 06, 2013
    • Akira Hatanaka's avatar
      [mips] Custom-legalize BR_JT. · 0f693a8a
      Akira Hatanaka authored
      In N64-static, GOT address is needed to compute the branch address.
      
      llvm-svn: 176580
      0f693a8a
    • Michael Liao's avatar
      Fix PR15355 · da22b30b
      Michael Liao authored
      - Clear 'mayStore' flag when loading from the atomic variable before the
        spin loop
      - Clear kill flag from one use to multiple use in registers forming the
        address to that atomic variable
      - don't use a physical register as live-in register in BB (neither entry
        nor landing pad.) by copying it into virtual register
      
      (patch by Cameron Zwarich)
      
      llvm-svn: 176538
      da22b30b
    • Akira Hatanaka's avatar
      [mips] Remove android calling convention. · 1454ed8a
      Akira Hatanaka authored
      This calling convention was added just to handle functions which return vector
      of floats. The fix committed in r165585 solves the problem.
      
      llvm-svn: 176530
      1454ed8a
  3. Mar 05, 2013
  4. Mar 04, 2013
  5. Mar 02, 2013
    • Jim Grosbach's avatar
      ARM: Creating a vector from a lane of another. · a3c5c769
      Jim Grosbach authored
      The VDUP instruction source register doesn't allow a non-constant lane
      index, so make sure we don't construct a ARM::VDUPLANE node asking it to
      do so.
      
      rdar://13328063
      http://llvm.org/bugs/show_bug.cgi?id=13963
      
      llvm-svn: 176413
      a3c5c769
    • Jim Grosbach's avatar
      Clean up code format a bit. · c6f1914e
      Jim Grosbach authored
      llvm-svn: 176412
      c6f1914e
    • Jim Grosbach's avatar
      Tidy up. Trailing whitespace. · 54efea0a
      Jim Grosbach authored
      llvm-svn: 176411
      54efea0a
    • Arnold Schwaighofer's avatar
      ARM NEON: Fix v2f32 float intrinsics · 99cba969
      Arnold Schwaighofer authored
      Mark them as expand, they are not legal as our backend does not match them.
      
      llvm-svn: 176410
      99cba969
    • Arnold Schwaighofer's avatar
      X86 cost model: Adjust cost for custom lowered vector multiplies · 20ef54f4
      Arnold Schwaighofer authored
      This matters for example in following matrix multiply:
      
      int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
        int i, j, k, val;
        for (i=0; i<rows; i++) {
          for (j=0; j<cols; j++) {
            val = 0;
            for (k=0; k<cols; k++) {
              val += m1[i][k] * m2[k][j];
            }
            m3[i][j] = val;
          }
        }
        return(m3);
      }
      
      Taken from the test-suite benchmark Shootout.
      
      We estimate the cost of the multiply to be 2 while we generate 9 instructions
      for it and end up being quite a bit slower than the scalar version (48% on my
      machine).
      
      Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
      vector into 2 128bits and handle the subvector muls like above with 9
      instructions.
      Only on avx-2 will we have a cost of 9 for v4i64.
      
      I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
      add instead of a mul because with a mul we now no longer vectorize. I did
      verify that the mul would be indeed more expensive when vectorized with 3
      kernels:
      
      for (i ...)
         r += a[i] * 3;
      for (i ...)
        m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
      and a matrix multiply.
      
      In each case the vectorized version was considerably slower.
      
      radar://13304919
      
      llvm-svn: 176403
      20ef54f4
Loading