- Nov 14, 2016
-
-
Simon Pilgrim authored
Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) llvm-svn: 286832
-
- Nov 08, 2016
-
-
Simon Pilgrim authored
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available. This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful. Differential Revision: https://reviews.llvm.org/D25910 llvm-svn: 286233
-
- Oct 31, 2016
-
-
Alexey Bataev authored
There is a bug describing poor cost model for floating point operations: Bug 29083 - [X86][SSE] Improve costs for floating point operations. This patch is the second one in series of patches dealing with cost model. Differential Revision: https://reviews.llvm.org/D25722 llvm-svn: 285564
-
- Oct 27, 2016
-
-
Simon Pilgrim authored
llvm-svn: 285329
-
Simon Pilgrim authored
With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq). Updated cost table accordingly. Differential Revision: https://reviews.llvm.org/D26011 llvm-svn: 285304
-
- Oct 23, 2016
-
-
Simon Pilgrim authored
llvm-svn: 284940
-
Simon Pilgrim authored
We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results. llvm-svn: 284939
-
Simon Pilgrim authored
llvm-svn: 284938
-
- Oct 20, 2016
-
-
Simon Pilgrim authored
We weren't checking for uniform const costs before the general cost, resulting in very high estimates. llvm-svn: 284755
-
Simon Pilgrim authored
Shows poor costings in AVX1/AVX512BW for certain vector types llvm-svn: 284748
-
Simon Pilgrim authored
We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults. We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases. llvm-svn: 284744
-
Simon Pilgrim authored
Shows current bug in AVX1/AVX512BW costs for 256 bit vector types llvm-svn: 284723
-
- Oct 18, 2016
-
-
Simon Pilgrim authored
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types. This patch adds support for fptosi to 2i32 from both 2f64 and 2f32. It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch. Differential Revision: https://reviews.llvm.org/D23808 llvm-svn: 284459
-
- Oct 03, 2016
-
-
Sanjay Patel authored
This should fix: https://llvm.org/bugs/show_bug.cgi?id=30433 There are a couple of open questions about the codegen: 1. Should we let scalar ops be scalars and avoid vector constant loads/splats? 2. Should we have a pass to combine constants such as the inverted pair that we have here? Differential Revision: https://reviews.llvm.org/D25165 llvm-svn: 283119
-
- Oct 01, 2016
-
-
Simon Pilgrim authored
llvm-svn: 283047
-
Simon Pilgrim authored
llvm-svn: 283044
-
Simon Pilgrim authored
llvm-svn: 283042
-
- Sep 18, 2016
-
-
Simon Pilgrim authored
llvm-svn: 281864
-
- Aug 21, 2016
-
-
Simon Pilgrim authored
There are more thorough tests found in vshift-*-cost.ll llvm-svn: 279406
-
Simon Pilgrim authored
llvm-svn: 279405
-
Simon Pilgrim authored
llvm-svn: 279404
-
Simon Pilgrim authored
llvm-svn: 279403
-
Simon Pilgrim authored
llvm-svn: 279402
-
- Aug 19, 2016
-
-
Simon Pilgrim authored
llvm-svn: 279301
-
Simon Pilgrim authored
llvm-svn: 279291
-
Simon Pilgrim authored
llvm-svn: 279283
-
- Aug 05, 2016
-
-
Michael Kuperstein authored
Shifts with a uniform but non-constant count were considered very expensive to vectorize, because the splat of the uniform count and the shift would tend to appear in different blocks. That made the splat invisible to ISel, and we'd scalarize the shift at codegen time. Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we are able to select the appropriate vector shifts. This updates the cost model to to take this into account by making shifts by a uniform cheap again. Differential Revision: https://reviews.llvm.org/D23049 llvm-svn: 277782
-
- Aug 04, 2016
-
-
Simon Pilgrim authored
llvm-svn: 277718
-
Simon Pilgrim authored
llvm-svn: 277716
-
- Jul 20, 2016
-
-
Simon Pilgrim authored
This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better. Differential Revision: https://reviews.llvm.org/D22456 llvm-svn: 276104
-
- Jul 17, 2016
-
-
Simon Pilgrim authored
llvm-svn: 275725
-
- Jul 11, 2016
-
-
Michael Kuperstein authored
Make some AVX and AVX512 cast costs more precise. Based on part of a patch by Elena Demikhovsky (D15604). Differential Revision: http://reviews.llvm.org/D22064 llvm-svn: 275106
-
- Jul 06, 2016
-
-
Sanjay Patel authored
This is "cvtdq2ps" which does not appear to be particularly slow on any CPU according to Agner's tables. Choosing "5" as a cost here as suggested in: https://llvm.org/bugs/show_bug.cgi?id=21356 ...but it seems very conservative given that the instruction is fully pipelined, and I think these costs are supposed to model throughput. Note that related costs are also most likely too high, but this fixes PR21356 and partly fixes PR28434. llvm-svn: 274658
-
Michael Kuperstein authored
The cost model should not assume vector casts get completely scalarized, since on targets that have vector support, the common case is a partial split up to the legal vector size. So, when a vector cast gets split, the resulting casts end up legal and cheap. Instead of pessimistically assuming scalarization, base TTI can use the costs the concrete TTI provides for the split vector, plus a fudge factor to account for the cost of the split itself. This fudge factor is currently 1 by default, except on AMDGPU where inserts and extracts are considered free. Differential Revision: http://reviews.llvm.org/D21251 llvm-svn: 274642
-
- Jun 28, 2016
-
-
Artur Pilipenko authored
This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 274043
-
- Jun 27, 2016
-
-
Artur Pilipenko authored
Revert -r273892 "Support arbitrary addrspace pointers in masked load/store intrinsics" since some of the clang tests don't expect to see the updated signatures. llvm-svn: 273895
-
Artur Pilipenko authored
This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 273892
-
- Jun 21, 2016
-
-
Michael Kuperstein authored
llvm-svn: 273316
-
Simon Pilgrim authored
The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization. Differential Revision: http://reviews.llvm.org/D21521 llvm-svn: 273217
-
- Jun 11, 2016
-
-
Simon Pilgrim authored
To account for the fast PSHUFB implementation now available llvm-svn: 272484
-