Skip to content
  1. Nov 14, 2016
  2. Oct 27, 2016
  3. Aug 21, 2016
  4. Aug 19, 2016
  5. Jun 21, 2016
  6. Jul 29, 2015
  7. Apr 12, 2013
  8. Apr 03, 2013
  9. Mar 20, 2013
    • Michael Liao's avatar
      Correct cost model for vector shift on AVX2 · 70dd7f99
      Michael Liao authored
      - After moving logic recognizing vector shift with scalar amount from
        DAG combining into DAG lowering, we declare to customize all vector
        shifts even vector shift on AVX is legal. As a result, the cost model
        needs special tuning to identify these legal cases.
      
      llvm-svn: 177586
      70dd7f99
  10. Mar 02, 2013
    • Arnold Schwaighofer's avatar
      X86 cost model: Adjust cost for custom lowered vector multiplies · 20ef54f4
      Arnold Schwaighofer authored
      This matters for example in following matrix multiply:
      
      int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
        int i, j, k, val;
        for (i=0; i<rows; i++) {
          for (j=0; j<cols; j++) {
            val = 0;
            for (k=0; k<cols; k++) {
              val += m1[i][k] * m2[k][j];
            }
            m3[i][j] = val;
          }
        }
        return(m3);
      }
      
      Taken from the test-suite benchmark Shootout.
      
      We estimate the cost of the multiply to be 2 while we generate 9 instructions
      for it and end up being quite a bit slower than the scalar version (48% on my
      machine).
      
      Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
      vector into 2 128bits and handle the subvector muls like above with 9
      instructions.
      Only on avx-2 will we have a cost of 9 for v4i64.
      
      I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
      add instead of a mul because with a mul we now no longer vectorize. I did
      verify that the mul would be indeed more expensive when vectorized with 3
      kernels:
      
      for (i ...)
         r += a[i] * 3;
      for (i ...)
        m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
      and a matrix multiply.
      
      In each case the vectorized version was considerably slower.
      
      radar://13304919
      
      llvm-svn: 176403
      20ef54f4
  11. Dec 05, 2012
  12. Nov 05, 2012
  13. Nov 03, 2012
Loading