Commits · 27fed8e5d636d67ed5e2dff77705dcae1fcd0b15 · Roger Ferrer / llvm-epi

Nov 14, 2016

[X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets · 27fed8e5

Simon Pilgrim authored Nov 14, 2016

Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.

This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)

llvm-svn: 286832

27fed8e5

Nov 08, 2016

[VectorLegalizer] Expansion of CTLZ using CTPOP when possible · d02c5520

Simon Pilgrim authored Nov 08, 2016

This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.

This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.

Differential Revision: https://reviews.llvm.org/D25910

llvm-svn: 286233

d02c5520

Oct 31, 2016

Improved cost model for FDIV and FSQRT, by Andrew Tischenko · d07c731d

Alexey Bataev authored Oct 31, 2016

There is a bug describing poor cost model for floating point operations:
Bug 29083 - [X86][SSE] Improve costs for floating point operations. This
patch is the second one in series of patches dealing with cost model.

Differential Revision: https://reviews.llvm.org/D25722

llvm-svn: 285564

d07c731d

Oct 27, 2016
- [X86][AVX512] Fix MUL v8i64 costs on non-AVX512DQ targets · d23219b9
  Simon Pilgrim authored Oct 27, 2016
```
llvm-svn: 285329
```
  d23219b9
- [X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64 · 820e1326
  Simon Pilgrim authored Oct 27, 2016
```
With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq).

Updated cost table accordingly.

Differential Revision: https://reviews.llvm.org/D26011

llvm-svn: 285304
```
  820e1326
Oct 23, 2016
- [CostModel][X86] Added tests for current integer signed/unsigned remainder costs · d09c04d2
  Simon Pilgrim authored Oct 23, 2016
```
llvm-svn: 284940
```
  d09c04d2
- [X86][SSE] Add SSE41/AVX1 costs for vector shifts. · 6ac1e98b
  Simon Pilgrim authored Oct 23, 2016
```
We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results.

llvm-svn: 284939
```
  6ac1e98b
- [CostModel][X86] Added tests for current integer trunc costs · e16b1e22
  Simon Pilgrim authored Oct 23, 2016
```
llvm-svn: 284938
```
  e16b1e22
Oct 20, 2016
- [CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv uniformconst costs for 256/512 bit integer vectors · 365be4f9
  Simon Pilgrim authored Oct 20, 2016
```
We weren't checking for uniform const costs before the general cost, resulting in very high estimates.

llvm-svn: 284755
```
  365be4f9
- [CostModel][X86] Added tests for sdiv/udiv costs for uniform const and uniform const power-of-2 · 1388c0ac
  Simon Pilgrim authored Oct 20, 2016
```
Shows poor costings in AVX1/AVX512BW for certain vector types

llvm-svn: 284748
```
  1388c0ac
- [CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv general costs for 256/512 bit integer vectors · 025e26dd
  Simon Pilgrim authored Oct 20, 2016
```
We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults.

We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases.

llvm-svn: 284744
```
  025e26dd
- [CostModel][X86] Added tests for sdiv/udiv costs for scalar and 128/256/512 bit integer vectors · 16cc616e
  Simon Pilgrim authored Oct 20, 2016
```
Shows current bug in AVX1/AVX512BW costs for 256 bit vector types

llvm-svn: 284723
```
  16cc616e
Oct 18, 2016

[X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32 · 4ddc92b6

Simon Pilgrim authored Oct 18, 2016

As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.

This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.

It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.

Differential Revision: https://reviews.llvm.org/D23808

llvm-svn: 284459

4ddc92b6

Oct 03, 2016

[x86, SSE/AVX] allow 128/256-bit lowering for copysign vector intrinsics (PR30433) · d27a2187

Sanjay Patel authored Oct 03, 2016

This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30433

There are a couple of open questions about the codegen:
1. Should we let scalar ops be scalars and avoid vector constant loads/splats?
2. Should we have a pass to combine constants such as the inverted pair that we have here?

Differential Revision: https://reviews.llvm.org/D25165
 

llvm-svn: 283119

d27a2187

Oct 01, 2016
- [CostModel][X86] Added tests for current fptosi/fptoui costs · 1ec20e9b
  Simon Pilgrim authored Oct 01, 2016
```
llvm-svn: 283047
```
  1ec20e9b
- [CostModel][X86] Added fcopysign costs · e0ec5c1f
  Simon Pilgrim authored Oct 01, 2016
```
llvm-svn: 283044
```
  e0ec5c1f
- [CostModel][X86] Added fabs costs · 8b021c38
  Simon Pilgrim authored Oct 01, 2016
```
llvm-svn: 283042
```
  8b021c38
Sep 18, 2016
- [CostModel][X86] Added scalar float op costs · 91780598
  Simon Pilgrim authored Sep 18, 2016
```
llvm-svn: 281864
```
  91780598
Aug 21, 2016
- [CostModel][X86] Removed shift tests · 89e375a9
  Simon Pilgrim authored Aug 21, 2016
```
There are more thorough tests found in vshift-*-cost.ll 

llvm-svn: 279406
```
  89e375a9
- [CostModel][X86] Added costs for vXi16 and vXi8 vectors for add/sub/mul/and/or/xor tests · 6ad12ec6
  Simon Pilgrim authored Aug 21, 2016
```
llvm-svn: 279405
```
  6ad12ec6
- [CostModel][X86] Replaced SSSE3 with SSE2 costs to create a better baseline · b0a0576f
  Simon Pilgrim authored Aug 21, 2016
```
llvm-svn: 279404
```
  b0a0576f
- [CostModel][X86] Added fsqrt and fma costs · 07d7a21e
  Simon Pilgrim authored Aug 21, 2016
```
llvm-svn: 279403
```
  07d7a21e
- [CostModel][X86] Split off float arithmetic cost tests · 3cd61a08
  Simon Pilgrim authored Aug 21, 2016
```
llvm-svn: 279402
```
  3cd61a08
Aug 19, 2016
- [CostModel][X86] Added sub, or, and, fadd and fsub costs and missing 512-bit mul costs · 054e7d2e
  Simon Pilgrim authored Aug 19, 2016
```
llvm-svn: 279301
```
  054e7d2e
- [CostModel][X86] Added some AVX512 and 512-bit vector cost tests · fbfa3ee4
  Simon Pilgrim authored Aug 19, 2016
```
llvm-svn: 279291
```
  fbfa3ee4
- [CostModel][X86] Add fdiv + frem cost tests · e309d2d0
  Simon Pilgrim authored Aug 19, 2016
```
llvm-svn: 279283
```
  e309d2d0
Aug 05, 2016

[LV, X86] Be more optimistic about vectorizing shifts. · 3ceac2bb

Michael Kuperstein authored Aug 04, 2016

Shifts with a uniform but non-constant count were considered very expensive to
vectorize, because the splat of the uniform count and the shift would tend to
appear in different blocks. That made the splat invisible to ISel, and we'd
scalarize the shift at codegen time.

Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we
are able to select the appropriate vector shifts. This updates the cost model to
to take this into account by making shifts by a uniform cheap again.

Differential Revision: https://reviews.llvm.org/D23049

llvm-svn: 277782

3ceac2bb

Aug 04, 2016
- [X86] Dropped XOP ctbits checks - they match the AVX checks · c8fe1327
  Simon Pilgrim authored Aug 04, 2016
```
llvm-svn: 277718
```
  c8fe1327
- [X86][SSE] Add initial costs for vector CTTZ/CTLZ · 5d5ca9c0
  Simon Pilgrim authored Aug 04, 2016
```
llvm-svn: 277716
```
  5d5ca9c0
Jul 20, 2016

[X86][SSE] Add cost model values for CTPOP of vectors · 1b4f511a

Simon Pilgrim authored Jul 20, 2016

This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better.

Differential Revision: https://reviews.llvm.org/D22456

llvm-svn: 276104

1b4f511a

Jul 17, 2016
- [X86] Add CTPOP/CTLZ/CTTZ scalar cost tests · 47638635
  Simon Pilgrim authored Jul 17, 2016
```
llvm-svn: 275725
```
  47638635
Jul 11, 2016

[X86] Make some cast costs more precise · f0c59330

Michael Kuperstein authored Jul 11, 2016

Make some AVX and AVX512 cast costs more precise.
Based on part of a patch by Elena Demikhovsky (D15604).

Differential Revision: http://reviews.llvm.org/D22064

llvm-svn: 275106

f0c59330

Jul 06, 2016

[x86] fix cost of SINT_TO_FP for i32 --> float (PR21356, PR28434) · 04b3496d

Sanjay Patel authored Jul 06, 2016

This is "cvtdq2ps" which does not appear to be particularly slow on any CPU
according to Agner's tables. Choosing "5" as a cost here as suggested in:
https://llvm.org/bugs/show_bug.cgi?id=21356
...but it seems very conservative given that the instruction is fully pipelined,
and I think these costs are supposed to model throughput.

Note that related costs are also most likely too high, but this fixes PR21356
and partly fixes PR28434.

llvm-svn: 274658

04b3496d

[TTI] The cost model should not assume vector casts get completely scalarized · aa71bdd3

Michael Kuperstein authored Jul 06, 2016

The cost model should not assume vector casts get completely scalarized, since
on targets that have vector support, the common case is a partial split up to
the legal vector size. So, when a vector cast gets split, the resulting casts
end up legal and cheap.

Instead of pessimistically assuming scalarization, base TTI can use the costs
the concrete TTI provides for the split vector, plus a fudge factor to account
for the cost of the split itself. This fudge factor is currently 1 by default,
except on AMDGPU where inserts and extracts are considered free.

Differential Revision: http://reviews.llvm.org/D21251

llvm-svn: 274642

aa71bdd3

Jun 28, 2016

Support arbitrary addrspace pointers in masked load/store intrinsics · 7ad95ec2

Artur Pilipenko authored Jun 28, 2016

This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details).

This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270

llvm-svn: 274043

7ad95ec2

Jun 27, 2016

Revert -r273892 "Support arbitrary addrspace pointers in masked load/store... · 72f76b88

Artur Pilipenko authored Jun 27, 2016

Revert -r273892 "Support arbitrary addrspace pointers in masked load/store intrinsics" since some of the clang tests don't expect to see the updated signatures. 

llvm-svn: 273895

72f76b88

Support arbitrary addrspace pointers in masked load/store intrinsics · a36aa415

Artur Pilipenko authored Jun 27, 2016

This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details).

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270

llvm-svn: 273892

a36aa415

Jun 21, 2016

[X86] Make arithmetic operations cost model test saner. NFC. · 78028b84
Michael Kuperstein authored Jun 21, 2016
```
llvm-svn: 273316
```
78028b84

[X86][SSE] Add cost model for BSWAP of vectors · 356e823b

Simon Pilgrim authored Jun 20, 2016

The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization.

Differential Revision: http://reviews.llvm.org/D21521

llvm-svn: 273217

356e823b

Jun 11, 2016
- [CostModel][X86][SSE] Updated costs for vector BITREVERSE ops on SSSE3+ targets · 3fc09f7b
  Simon Pilgrim authored Jun 11, 2016
```
To account for the fast PSHUFB implementation now available

llvm-svn: 272484
```
  3fc09f7b