- Nov 14, 2016
-
-
Simon Pilgrim authored
Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) llvm-svn: 286832
-
- Oct 27, 2016
-
-
Simon Pilgrim authored
llvm-svn: 285329
-
Simon Pilgrim authored
With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq). Updated cost table accordingly. Differential Revision: https://reviews.llvm.org/D26011 llvm-svn: 285304
-
- Aug 21, 2016
-
-
Simon Pilgrim authored
There are more thorough tests found in vshift-*-cost.ll llvm-svn: 279406
-
Simon Pilgrim authored
llvm-svn: 279405
-
Simon Pilgrim authored
llvm-svn: 279402
-
- Aug 19, 2016
-
-
Simon Pilgrim authored
llvm-svn: 279301
-
Simon Pilgrim authored
llvm-svn: 279291
-
Simon Pilgrim authored
llvm-svn: 279283
-
- Jun 21, 2016
-
-
Michael Kuperstein authored
llvm-svn: 273316
-
- Jul 29, 2015
-
-
Simon Pilgrim authored
This patch vectorizes the v2i64/v4i64 ASHR shift operations - the last remaining integer vector shifts that are still being transferred to/from the scalar unit to be completed. Differential Revision: http://reviews.llvm.org/D11439 llvm-svn: 243569
-
- Apr 12, 2013
-
-
Nadav Rotem authored
CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles. llvm-svn: 179413
-
- Apr 03, 2013
-
-
Arnold Schwaighofer authored
The default logic does not correctly identify costs of casts because they are marked as custom on x86. For some cases, where the shift amount is a scalar we would be able to generate better code. Unfortunately, when this is the case the value (the splat) will get hoisted out of the loop, thereby making it invisible to ISel. radar://13130673 radar://13537826 llvm-svn: 178703
-
- Mar 20, 2013
-
-
Michael Liao authored
- After moving logic recognizing vector shift with scalar amount from DAG combining into DAG lowering, we declare to customize all vector shifts even vector shift on AVX is legal. As a result, the cost model needs special tuning to identify these legal cases. llvm-svn: 177586
-
- Mar 02, 2013
-
-
Arnold Schwaighofer authored
This matters for example in following matrix multiply: int **mmult(int rows, int cols, int **m1, int **m2, int **m3) { int i, j, k, val; for (i=0; i<rows; i++) { for (j=0; j<cols; j++) { val = 0; for (k=0; k<cols; k++) { val += m1[i][k] * m2[k][j]; } m3[i][j] = val; } } return(m3); } Taken from the test-suite benchmark Shootout. We estimate the cost of the multiply to be 2 while we generate 9 instructions for it and end up being quite a bit slower than the scalar version (48% on my machine). Also, properly differentiate between avx1 and avx2. On avx-1 we still split the vector into 2 128bits and handle the subvector muls like above with 9 instructions. Only on avx-2 will we have a cost of 9 for v4i64. I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an add instead of a mul because with a mul we now no longer vectorize. I did verify that the mul would be indeed more expensive when vectorized with 3 kernels: for (i ...) r += a[i] * 3; for (i ...) m1[i] = m1[i] * 3; // This matches the test case in avx1.ll and a matrix multiply. In each case the vectorized version was considerably slower. radar://13304919 llvm-svn: 176403
-
- Dec 05, 2012
-
-
Nadav Rotem authored
llvm-svn: 169423
-
- Nov 05, 2012
-
-
Nadav Rotem authored
llvm-svn: 167395
-
- Nov 03, 2012
-
-
Nadav Rotem authored
llvm-svn: 167347
-