Skip to content
  1. Mar 06, 2013
  2. Mar 05, 2013
  3. Mar 04, 2013
  4. Mar 02, 2013
    • Jim Grosbach's avatar
      ARM: Creating a vector from a lane of another. · a3c5c769
      Jim Grosbach authored
      The VDUP instruction source register doesn't allow a non-constant lane
      index, so make sure we don't construct a ARM::VDUPLANE node asking it to
      do so.
      
      rdar://13328063
      http://llvm.org/bugs/show_bug.cgi?id=13963
      
      llvm-svn: 176413
      a3c5c769
    • Jim Grosbach's avatar
      Clean up code format a bit. · c6f1914e
      Jim Grosbach authored
      llvm-svn: 176412
      c6f1914e
    • Jim Grosbach's avatar
      Tidy up. Trailing whitespace. · 54efea0a
      Jim Grosbach authored
      llvm-svn: 176411
      54efea0a
    • Arnold Schwaighofer's avatar
      ARM NEON: Fix v2f32 float intrinsics · 99cba969
      Arnold Schwaighofer authored
      Mark them as expand, they are not legal as our backend does not match them.
      
      llvm-svn: 176410
      99cba969
    • Nuno Lopes's avatar
      recommit r172363 & r171325 (reverted in r172756) · 589443bd
      Nuno Lopes authored
      This adds minimalistic support for PHI nodes to llvm.objectsize() evaluation
      
      fingers crossed so that it does break clang boostrap again..
      
      llvm-svn: 176408
      589443bd
    • Nuno Lopes's avatar
      add getUnderlyingObjectSize() · 6e3d4601
      Nuno Lopes authored
      this is similar to getObjectSize(), but doesnt subtract the offset
      tweak the BasicAA code accordingly (per PR14988)
      
      llvm-svn: 176407
      6e3d4601
    • Arnold Schwaighofer's avatar
      X86 cost model: Adjust cost for custom lowered vector multiplies · 20ef54f4
      Arnold Schwaighofer authored
      This matters for example in following matrix multiply:
      
      int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
        int i, j, k, val;
        for (i=0; i<rows; i++) {
          for (j=0; j<cols; j++) {
            val = 0;
            for (k=0; k<cols; k++) {
              val += m1[i][k] * m2[k][j];
            }
            m3[i][j] = val;
          }
        }
        return(m3);
      }
      
      Taken from the test-suite benchmark Shootout.
      
      We estimate the cost of the multiply to be 2 while we generate 9 instructions
      for it and end up being quite a bit slower than the scalar version (48% on my
      machine).
      
      Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
      vector into 2 128bits and handle the subvector muls like above with 9
      instructions.
      Only on avx-2 will we have a cost of 9 for v4i64.
      
      I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
      add instead of a mul because with a mul we now no longer vectorize. I did
      verify that the mul would be indeed more expensive when vectorized with 3
      kernels:
      
      for (i ...)
         r += a[i] * 3;
      for (i ...)
        m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
      and a matrix multiply.
      
      In each case the vectorized version was considerably slower.
      
      radar://13304919
      
      llvm-svn: 176403
      20ef54f4
    • Andrew Trick's avatar
      Added FIXME for future Hexagon cleanup. · 63474629
      Andrew Trick authored
      llvm-svn: 176400
      63474629
    • Nadav Rotem's avatar
      PR14448 - prevent the loop vectorizer from vectorizing the same loop twice. · 739e37a0
      Nadav Rotem authored
      The LoopVectorizer often runs multiple times on the same function due to inlining.
      When this happens the loop vectorizer often vectorizes the same loops multiple times, increasing code size and adding unneeded branches.
      With this patch, the vectorizer during vectorization puts metadata on scalar loops and marks them as 'already vectorized' so that it knows to ignore them when it sees them a second time.
      
      PR14448.
      
      llvm-svn: 176399
      739e37a0
Loading