Commits · 27fed8e5d636d67ed5e2dff77705dcae1fcd0b15 · Roger Ferrer / llvm-epi

Nov 14, 2016

[X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets · 27fed8e5

Simon Pilgrim authored Nov 14, 2016

Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.

This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)

llvm-svn: 286832

27fed8e5

Oct 27, 2016
- [X86][AVX512] Fix MUL v8i64 costs on non-AVX512DQ targets · d23219b9
  Simon Pilgrim authored Oct 27, 2016
```
llvm-svn: 285329
```
  d23219b9
- [X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64 · 820e1326
  Simon Pilgrim authored Oct 27, 2016
```
With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq).

Updated cost table accordingly.

Differential Revision: https://reviews.llvm.org/D26011

llvm-svn: 285304
```
  820e1326
Aug 21, 2016
- [CostModel][X86] Removed shift tests · 89e375a9
  Simon Pilgrim authored Aug 21, 2016
```
There are more thorough tests found in vshift-*-cost.ll 

llvm-svn: 279406
```
  89e375a9
- [CostModel][X86] Added costs for vXi16 and vXi8 vectors for add/sub/mul/and/or/xor tests · 6ad12ec6
  Simon Pilgrim authored Aug 21, 2016
```
llvm-svn: 279405
```
  6ad12ec6
- [CostModel][X86] Split off float arithmetic cost tests · 3cd61a08
  Simon Pilgrim authored Aug 21, 2016
```
llvm-svn: 279402
```
  3cd61a08
Aug 19, 2016
- [CostModel][X86] Added sub, or, and, fadd and fsub costs and missing 512-bit mul costs · 054e7d2e
  Simon Pilgrim authored Aug 19, 2016
```
llvm-svn: 279301
```
  054e7d2e
- [CostModel][X86] Added some AVX512 and 512-bit vector cost tests · fbfa3ee4
  Simon Pilgrim authored Aug 19, 2016
```
llvm-svn: 279291
```
  fbfa3ee4
- [CostModel][X86] Add fdiv + frem cost tests · e309d2d0
  Simon Pilgrim authored Aug 19, 2016
```
llvm-svn: 279283
```
  e309d2d0
Jun 21, 2016
- [X86] Make arithmetic operations cost model test saner. NFC. · 78028b84
  Michael Kuperstein authored Jun 21, 2016
```
llvm-svn: 273316
```
  78028b84
Jul 29, 2015

[X86][SSE] Vectorize i64 ASHR operations · 86478c69

Simon Pilgrim authored Jul 29, 2015

This patch vectorizes the v2i64/v4i64 ASHR shift operations - the last remaining integer vector shifts that are still being transferred to/from the scalar unit to be completed.

Differential Revision: http://reviews.llvm.org/D11439

llvm-svn: 243569

86478c69

Apr 12, 2013

CostModel: increase the default cost of supported floating point operations... · 87a0af6e

Nadav Rotem authored Apr 12, 2013

CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles.

llvm-svn: 179413

87a0af6e

Apr 03, 2013

X86 cost model: Vector shifts are expensive in most cases · e9b50164

Arnold Schwaighofer authored Apr 03, 2013

The default logic does not correctly identify costs of casts because they are
marked as custom on x86.

For some cases, where the shift amount is a scalar we would be able to generate
better code. Unfortunately, when this is the case the value (the splat) will get
hoisted out of the loop, thereby making it invisible to ISel.

radar://13130673
radar://13537826

llvm-svn: 178703

e9b50164

Mar 20, 2013

Correct cost model for vector shift on AVX2 · 70dd7f99

Michael Liao authored Mar 20, 2013

- After moving logic recognizing vector shift with scalar amount from
  DAG combining into DAG lowering, we declare to customize all vector
  shifts even vector shift on AVX is legal. As a result, the cost model
  needs special tuning to identify these legal cases.

llvm-svn: 177586

70dd7f99

Mar 02, 2013

X86 cost model: Adjust cost for custom lowered vector multiplies · 20ef54f4

Arnold Schwaighofer authored Mar 02, 2013

This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

llvm-svn: 176403

20ef54f4

Dec 05, 2012
- Cost Model: change the default cost of control flow instructions (br / ret / ...) to zero. · 0a471ea6
  Nadav Rotem authored Dec 05, 2012
```
llvm-svn: 169423
```
  0a471ea6
Nov 05, 2012
- Implement the cost of abnormal x86 instruction lowering as a table. · 7411623f
  Nadav Rotem authored Nov 05, 2012
```
llvm-svn: 167395
```
  7411623f
Nov 03, 2012
- X86 CostModel: Add support for a some of the common arithmetic instructions for SSE4, AVX and AVX2. · c2345cbe
  Nadav Rotem authored Nov 03, 2012
```
llvm-svn: 167347
```
  c2345cbe