Commits · 27fed8e5d636d67ed5e2dff77705dcae1fcd0b15 · Roger Ferrer / llvm-epi

Nov 14, 2016

[X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets · 27fed8e5

Simon Pilgrim authored Nov 14, 2016

Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.

This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)

llvm-svn: 286832

27fed8e5

Nov 08, 2016

[VectorLegalizer] Expansion of CTLZ using CTPOP when possible · d02c5520

Simon Pilgrim authored Nov 08, 2016

This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.

This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.

Differential Revision: https://reviews.llvm.org/D25910

llvm-svn: 286233

d02c5520

Oct 31, 2016

Improved cost model for FDIV and FSQRT, by Andrew Tischenko · d07c731d

Alexey Bataev authored Oct 31, 2016

There is a bug describing poor cost model for floating point operations:
Bug 29083 - [X86][SSE] Improve costs for floating point operations. This
patch is the second one in series of patches dealing with cost model.

Differential Revision: https://reviews.llvm.org/D25722

llvm-svn: 285564

d07c731d

Oct 27, 2016
- [X86][AVX512] Fix MUL v8i64 costs on non-AVX512DQ targets · d23219b9
  Simon Pilgrim authored Oct 27, 2016
```
llvm-svn: 285329
```
  d23219b9
- [X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64 · 820e1326
  Simon Pilgrim authored Oct 27, 2016
```
With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq).

Updated cost table accordingly.

Differential Revision: https://reviews.llvm.org/D26011

llvm-svn: 285304
```
  820e1326
Oct 23, 2016

[X86][SSE] Add SSE41/AVX1 costs for vector shifts. · 6ac1e98b

Simon Pilgrim authored Oct 23, 2016

We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results.

llvm-svn: 284939

6ac1e98b

Oct 20, 2016

[X86] Enable interleaved memory access by default · b2443ed6

Michael Kuperstein authored Oct 20, 2016

This lets the loop vectorizer generate interleaved memory accesses on x86.

Differential Revision: https://reviews.llvm.org/D25350

llvm-svn: 284779

b2443ed6

[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv uniformconst costs for 256/512 bit integer vectors · 365be4f9
Simon Pilgrim authored Oct 20, 2016
```
We weren't checking for uniform const costs before the general cost, resulting in very high estimates.

llvm-svn: 284755
```
365be4f9

[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv general costs for 256/512 bit integer vectors · 025e26dd

Simon Pilgrim authored Oct 20, 2016

We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults.

We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases.

llvm-svn: 284744

025e26dd

Oct 18, 2016

[X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32 · 4ddc92b6

Simon Pilgrim authored Oct 18, 2016

As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.

This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.

It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.

Differential Revision: https://reviews.llvm.org/D23808

llvm-svn: 284459

4ddc92b6

Oct 12, 2016

NFC: The Cost Model specialization, by Andrey Tischenko · b271a58e

Alexey Bataev authored Oct 12, 2016

The current Cost Model implementation is very inaccurate and has to be
updated, improved, re-implemented to be able to take into account the
concrete CPU models and the concrete targets where this Cost Model is
being used. For example, the Latency Cost Model should be differ from
Code Size Cost Model, etc.
This patch is the first step to launch the developing and implementation
of a new Cost Model generation.

Differential Revision: https://reviews.llvm.org/D25186

llvm-svn: 284012

b271a58e

Aug 17, 2016

Replace "fallthrough" comments with LLVM_FALLTHROUGH · b03fd12c

Justin Bogner authored Aug 17, 2016

This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.

llvm-svn: 278902

b03fd12c

Aug 08, 2016

Revert "[X86] Support the "ms-hotpatch" attribute." · e9c32c7e

Charles Davis authored Aug 08, 2016

This reverts commit r278048. Something changed between the last time I
built this--it takes awhile on my ridiculously slow and ancient
computer--and now that broke this.

llvm-svn: 278053

e9c32c7e

[X86] Support the "ms-hotpatch" attribute. · 0822aa11

Charles Davis authored Aug 08, 2016

Summary:
Based on two patches by Michael Mueller.

This is a target attribute that causes a function marked with it to be
emitted as "hotpatchable". This particular mechanism was originally
devised by Microsoft for patching their binaries (which they are
constantly updating to stay ahead of crackers, script kiddies, and other
ne'er-do-wells on the Internet), but is now commonly abused by Windows
programs to hook API functions.

This mechanism is target-specific. For x86, a two-byte no-op instruction
is emitted at the function's entry point; the entry point must be
immediately preceded by 64 (32-bit) or 128 (64-bit) bytes of padding.
This padding is where the patch code is written. The two byte no-op is
then overwritten with a short jump into this code. The no-op is usually
a `movl %edi, %edi` instruction; this is used as a magic value
indicating that this is a hotpatchable function.

Reviewers: majnemer, sanjoy, rnk

Subscribers: dberris, llvm-commits

Differential Revision: https://reviews.llvm.org/D19908

llvm-svn: 278048

0822aa11

Aug 05, 2016

[LV, X86] Be more optimistic about vectorizing shifts. · 3ceac2bb

Michael Kuperstein authored Aug 04, 2016

Shifts with a uniform but non-constant count were considered very expensive to
vectorize, because the splat of the uniform count and the shift would tend to
appear in different blocks. That made the splat invisible to ISel, and we'd
scalarize the shift at codegen time.

Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we
are able to select the appropriate vector shifts. This updates the cost model to
to take this into account by making shifts by a uniform cheap again.

Differential Revision: https://reviews.llvm.org/D23049

llvm-svn: 277782

3ceac2bb

Aug 04, 2016
- [X86][SSE] Add initial costs for vector CTTZ/CTLZ · 5d5ca9c0
  Simon Pilgrim authored Aug 04, 2016
```
llvm-svn: 277716
```
  5d5ca9c0
Aug 02, 2016
- [AVX512] Don't use i128 masked gather/scatter/load/store. Do more accurately dataWidth check. · f44b79d0
  Igor Breger authored Aug 02, 2016
```
Differential Revision: http://reviews.llvm.org/D23055

llvm-svn: 277435
```
  f44b79d0
Jul 20, 2016

[X86][SSE] Add cost model values for CTPOP of vectors · 1b4f511a

Simon Pilgrim authored Jul 20, 2016

This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better.

Differential Revision: https://reviews.llvm.org/D22456

llvm-svn: 276104

1b4f511a

Jul 17, 2016
- Strip trailing whitespace · 285d9e4d
  Simon Pilgrim authored Jul 17, 2016
```
llvm-svn: 275726
```
  285d9e4d
Jul 11, 2016

[X86] Make some cast costs more precise · f0c59330

Michael Kuperstein authored Jul 11, 2016

Make some AVX and AVX512 cast costs more precise.
Based on part of a patch by Elena Demikhovsky (D15604).

Differential Revision: http://reviews.llvm.org/D22064

llvm-svn: 275106

f0c59330

Jul 06, 2016

[x86] fix cost of SINT_TO_FP for i32 --> float (PR21356, PR28434) · 04b3496d

Sanjay Patel authored Jul 06, 2016

This is "cvtdq2ps" which does not appear to be particularly slow on any CPU
according to Agner's tables. Choosing "5" as a cost here as suggested in:
https://llvm.org/bugs/show_bug.cgi?id=21356
...but it seems very conservative given that the instruction is fully pipelined,
and I think these costs are supposed to model throughput.

Note that related costs are also most likely too high, but this fixes PR21356
and partly fixes PR28434.

llvm-svn: 274658

04b3496d

[X86] Sort cast cost tables. NFC. · 1b62e0e9

Michael Kuperstein authored Jul 06, 2016

Cast cost tables are now sorted, for each cast type, lexicographically on
[source base type, source vector width, dest base type, base vector width].

llvm-svn: 274653

1b62e0e9

Jun 21, 2016

[X86][SSE] Add cost model for BSWAP of vectors · 356e823b

Simon Pilgrim authored Jun 20, 2016

The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization.

Differential Revision: http://reviews.llvm.org/D21521

llvm-svn: 273217

356e823b

Jun 11, 2016
- [CostModel][X86][SSE] Updated costs for vector BITREVERSE ops on SSSE3+ targets · 3fc09f7b
  Simon Pilgrim authored Jun 11, 2016
```
To account for the fast PSHUFB implementation now available

llvm-svn: 272484
```
  3fc09f7b
Jun 10, 2016

[X86] Add costs for SSE zext/sext to v4i64 to TTI · 9a0542a7

Michael Kuperstein authored Jun 10, 2016

The costs are somewhat hand-wavy, but should be much closer to the truth
than what we get from BasicTTI.

Differential Revision: http://reviews.llvm.org/D21156

llvm-svn: 272406

9a0542a7

May 25, 2016

[x86] avoid code explosion from LoopVectorizer for gather loop (PR27826) · aedc347b

Sanjay Patel authored May 25, 2016

By making pointer extraction from a vector more expensive in the cost model,
we avoid the vectorization of a loop that is very likely to be memory-bound:
https://llvm.org/bugs/show_bug.cgi?id=27826

There are still bugs related to this, so we may need a more general solution
to avoid vectorizing obviously memory-bound loops when we don't have HW gather
support.

Differential Revision: http://reviews.llvm.org/D20601

llvm-svn: 270729

aedc347b

May 24, 2016

[CostModel][X86][XOP] Added XOP costmodel for BITREVERSE · 14000b3c

Simon Pilgrim authored May 24, 2016

Now that we have a nice fast VPPERM solution. Added framework for future intrinsic costs as well.

llvm-svn: 270537

14000b3c

May 09, 2016

[X86][SSE] Improve cost model for i64 vector comparisons on pre-SSE42 targets · eec3a95f

Simon Pilgrim authored May 09, 2016

As discussed on PR24888, until SSE42 we don't have access to PCMPGTQ for v2i64 comparisons, but the cost models don't reflect this, resulting in over-optimistic vectorizaton.

This patch adds SSE2 'base level' costs that match what a typical target is capable of and only reduces the v2i64 costs at SSE42.

Technically SSE41 provides a PCMPEQQ v2i64 equality test, but as getCmpSelInstrCost doesn't give us a way to discriminate between comparison test types we can't easily make use of this, otherwise we could split the cost of integer equality and greater-than tests to give better costings of each.

Differential Revision: http://reviews.llvm.org/D20057

llvm-svn: 268972

eec3a95f

Apr 22, 2016

[X86]: Changing cost for “TRUNCATE v16i32 to v16i8” in SSE4.1 mode. · 468558a0

Ashutosh Nema authored Apr 22, 2016

Summary:
rL256194 transforms truncations between vectors of integers into PACKUS/PACKSS
operations during DAG combine. This generates better code for truncate, so cost
of truncate needs to be changed but looks like it got changed only in SSE2 table
Whereas this change is also applicable for SSE4.1, so the cost of truncate needs
to be changed for that as well. Cost of “TRUNCATE v16i32 to v16i8” & “TRUNCATE 
v16i16 to v16i8” should be same in SSE4.1 & SSE2 table. Removing their cost from
SSE4.1, so it will fall back to SSE2.

Reviewers: Simon Pilgrim
llvm-svn: 267123

468558a0

Apr 14, 2016

Do not use getGlobalContext()... ever. · 867e9146

Mehdi Amini authored Apr 14, 2016

This code was creating a new type in the global context, regardless
of which context the user is sitting in, what can possibly go wrong?

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266275

867e9146

Apr 05, 2016
- fix typo; NFC · 4c7d0944
  Sanjay Patel authored Apr 05, 2016
```
llvm-svn: 265442
```
  4c7d0944
Mar 09, 2016

[x86] fix cost model inaccuracy for vector memory ops · 9f6c4d50

Sanjay Patel authored Mar 09, 2016

The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar
has a completely double-pumped AVX implementation. But getting the cost model to
reflect that is a much bigger problem. The small goal here is simply to improve on
the lie that !AVX2 == SandyBridge.

Differential Revision: http://reviews.llvm.org/D18000

llvm-svn: 263069

9f6c4d50

Mar 06, 2016
- AVX512BW: Support llvm intrinsic masked vector load/store for i8/i16 element types on SKX · 4d94d4d5
  Igor Breger authored Mar 06, 2016
```
Differential Revision: http://reviews.llvm.org/D17913

llvm-svn: 262803
```
  4d94d4d5
Jan 25, 2016

AVX1 : Enable vector masked_load/store to AVX1. · 6d421419

Igor Breger authored Jan 25, 2016

Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q).

Differential Revision: http://reviews.llvm.org/D16528

llvm-svn: 258675

6d421419

Dec 28, 2015

Implemented cost model for masked gather and scatter operations · 54946988

Elena Demikhovsky authored Dec 28, 2015

The cost is calculated for all X86 targets. When gather/scatter instruction
is not supported we calculate the cost of scalar sequence.

Differential revision: http://reviews.llvm.org/D15677

llvm-svn: 256519

54946988

Dec 21, 2015

[X86][SSE] Transform truncations between vectors of integers into... · 8df93ce4

Cong Hou authored Dec 21, 2015

[X86][SSE] Transform truncations between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine.

This patch transforms truncation between vectors of integers into
X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in
lowering phase because after type legalization, the original truncation
will be turned into a BUILD_VECTOR with each element that is extracted
from a vector and then truncated, and from them it is difficult to do
this optimization. This greatly improves the performance of truncations
on some specific types.

Cost table is updated accordingly.


Differential revision: http://reviews.llvm.org/D14588

llvm-svn: 256194

8df93ce4

Dec 20, 2015

[X86] Prevent constant hoisting for a couple compare immediates that the... · 074e8452

Craig Topper authored Dec 20, 2015

[X86] Prevent constant hoisting for a couple compare immediates that the selection DAG knows how to optimize into a shift.

This allows "icmp ugt %a, 4294967295" and "icmp uge %a, 4294967296" to be optimized into right shifts by 32 which can fold the immediate into the shift instruction. These patterns show up with some regularity in real code.

Unfortunately, since getImmCost can't see the icmp predicate we can't be tell if we're only catching these specific cases.

llvm-svn: 256126

074e8452

Dec 11, 2015

[X86][SSE] Update the cost table for integer-integer conversions on SSE2/SSE4.1. · 59898d8c

Cong Hou authored Dec 11, 2015

Previously in the conversion cost table there are no entries for integer-integer
conversions on SSE2. This will result in imprecise costs for certain vectorized
operations. This patch adds those entries for SSE2 and SSE4.1. The cost numbers
are counted from the result of running llc on the new test case in this patch.

Differential revision: http://reviews.llvm.org/D15132

llvm-svn: 255315

59898d8c

Dec 02, 2015

AVX-512: Updated cost of FP/SINT/UINT conversion operations · a1a40cce

Elena Demikhovsky authored Dec 02, 2015

I checked and updated the cost of AVX-512 conversion operations. Added cost of conversion operations in DQ mode.
Conversion of illegal types that requires vector split is not calculated right now (like for other X86 targets).

Differential Revision: http://reviews.llvm.org/D15074

llvm-svn: 254494

a1a40cce

Nov 19, 2015

Pointers in Masked Load, Store, Gather, Scatter intrinsics · 1ca72e18

Elena Demikhovsky authored Nov 19, 2015

The masked intrinsics support all integer and floating point data types. I added the pointer type to this list.
Added tests for CodeGen and for Loop Vectorizer.
Updated the Language Reference.

Differential Revision: http://reviews.llvm.org/D14150

llvm-svn: 253544

1ca72e18