Commits · 7dee697faa5361fc953909b8d4547f2d08af5cab · Roger Ferrer / llvm-epi-0.8

Aug 09, 2013

Add a overload to CostTable which allows it to infer the size of the table. · 21585fd9

Benjamin Kramer authored Aug 09, 2013

Use it to avoid repeating ourselves too often. Also store MVT::SimpleValueType
in the TTI tables so they can be statically initialized, MVT's constructors
create bloated initialization code otherwise.

llvm-svn: 188095

21585fd9

Jul 12, 2013
- X86 cost model: Add cost for vectorized gather/scather · 6042a261
  Arnold Schwaighofer authored Jul 12, 2013
```
radar://14351991

llvm-svn: 186189
```
  6042a261
Jun 27, 2013
- Get rid of the unused class member. · 02dd93ec
  Nadav Rotem authored Jun 27, 2013
```
llvm-svn: 185086
```
  02dd93ec
- CostModel: improve the cost model for load/store of non power-of-two types... · f9ecbcb8
  Nadav Rotem authored Jun 27, 2013
```
CostModel: improve the cost model for load/store of non power-of-two types such as <3 x float>, which are popular in graphics.

llvm-svn: 185085
```
  f9ecbcb8
Jun 25, 2013
- X86 cost model: Vectorizing integer division is a bad idea · a04b9ef1
  Arnold Schwaighofer authored Jun 25, 2013
```
radar://14057959

llvm-svn: 184872
```
  a04b9ef1
Jun 18, 2013
- Fix 80 col violation. · 7d6c6252
  Nadav Rotem authored Jun 18, 2013
```
llvm-svn: 184228
```
  7d6c6252
Apr 17, 2013
- X86 cost model: Exit before calling getSimpleVT on non-simple VTs · c0c7ff4a
  Arnold Schwaighofer authored Apr 17, 2013
```
getSimpleVT can only handle simple value types.

radar://13676022

llvm-svn: 179714
```
  c0c7ff4a
Apr 08, 2013

X86 cost model: Model cost for uitofp and sitofp on SSE2 · f47d2d7f

Arnold Schwaighofer authored Apr 08, 2013

The costs are overfitted so that I can still use the legalization factor.

For example the following kernel has about half the throughput vectorized than
unvectorized when compiled with SSE2. Before this patch we would vectorize it.

unsigned short A[1024];
double B[1024];
void f() {
  int i;
  for (i = 0; i < 1024; ++i) {
    B[i] = (double) A[i];
  }
}

radar://13599001

llvm-svn: 179033

f47d2d7f

Apr 05, 2013

X86 cost model: Differentiate cost for vector shifts of constants · 44f902ed

Arnold Schwaighofer authored Apr 04, 2013

SSE2 has efficient support for shifts by a scalar. My previous change of making
shifts expensive did not take this into account marking all shifts as expensive.
This would prevent vectorization from happening where it is actually beneficial.

With this change we differentiate between shifts of constants and other shifts.

radar://13576547

llvm-svn: 178808

44f902ed

CostModel: Add parameter to instruction cost to further classify operand values · b9773871

Arnold Schwaighofer authored Apr 04, 2013

On certain architectures we can support efficient vectorized version of
instructions if the operand value is uniform (splat) or a constant scalar.
An example of this is a vector shift on x86.

We can efficiently support

for (i = 0 ; i < ; i += 4)
  w[0:3] = v[0:3] << <2, 2, 2, 2>

but not

for (i = 0; i < ; i += 4)
  w[0:3] = v[0:3] << x[0:3]

This patch adds a parameter to getArithmeticInstrCost to further qualify operand
values as uniform or uniform constant.

Targets can then choose to return a different cost for instructions with such
operand values.

A follow-up commit will test this feature on x86.

radar://13576547

llvm-svn: 178807

b9773871

Apr 03, 2013

X86 cost model: Vector shifts are expensive in most cases · e9b50164

Arnold Schwaighofer authored Apr 03, 2013

The default logic does not correctly identify costs of casts because they are
marked as custom on x86.

For some cases, where the shift amount is a scalar we would be able to generate
better code. Unfortunately, when this is the case the value (the splat) will get
hoisted out of the loop, thereby making it invisible to ISel.

radar://13130673
radar://13537826

llvm-svn: 178703

e9b50164

Apr 01, 2013
- X86TTI: Add accurate costs for itofp operations, based on the actual instruction counts. · 52ceb443
  Benjamin Kramer authored Apr 01, 2013
```
llvm-svn: 178459
```
  52ceb443
Mar 20, 2013

Correct cost model for vector shift on AVX2 · 70dd7f99

Michael Liao authored Mar 20, 2013

- After moving logic recognizing vector shift with scalar amount from
  DAG combining into DAG lowering, we declare to customize all vector
  shifts even vector shift on AVX is legal. As a result, the cost model
  needs special tuning to identify these legal cases.

llvm-svn: 177586

70dd7f99

Mar 19, 2013
- Optimize sext <4 x i8> and <4 x i16> to <4 x i64>. · 0f1bc60d
  Nadav Rotem authored Mar 19, 2013
```
Patch by Ahmad, Muhammad T <muhammad.t.ahmad@intel.com>

llvm-svn: 177421
```
  0f1bc60d
Mar 02, 2013

X86 cost model: Adjust cost for custom lowered vector multiplies · 20ef54f4

Arnold Schwaighofer authored Mar 02, 2013

This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

llvm-svn: 176403

20ef54f4

Feb 20, 2013

I optimized the following patterns: · 0ccdd131

Elena Demikhovsky authored Feb 20, 2013

 sext <4 x i1> to <4 x i64>
 sext <4 x i8> to <4 x i64>
 sext <4 x i16> to <4 x i64>
 
I'm running Combine on SIGN_EXTEND_IN_REG and revert SEXT patterns:
 (sext_in_reg (v4i64 anyext (v4i32 x )), ExtraVT) -> (v4i64 sext (v4i32 sext_in_reg (v4i32 x , ExtraVT)))
 
 The sext_in_reg (v4i32 x) may be lowered to shl+sar operations.
 The "sar" does not exist on 64-bit operation, so lowering sext_in_reg (v4i64 x) has no vector solution.

I also added a cost of this operations to the AVX costs table.

llvm-svn: 175619

0ccdd131

Jan 25, 2013
- Moving Cost Tables up to share with other targets · d4c392e6
  Renato Golin authored Jan 24, 2013
```
llvm-svn: 173382
```
  d4c392e6
Jan 20, 2013
- Revert CostTable algorithm, will re-write · e1fb0593
  Renato Golin authored Jan 20, 2013
```
llvm-svn: 172992
```
  e1fb0593
Jan 16, 2013

Change CostTable model to be global to all targets · f104c4c4

Renato Golin authored Jan 16, 2013

Moving the X86CostTable to a common place, so that other back-ends
can share the code. Also simplifying it a bit and commoning up
tables with one and two types on operations.

llvm-svn: 172658

f104c4c4

Jan 09, 2013

ARM Cost model: Use the size of vector registers and widest vectorizable... · b1791a75

Nadav Rotem authored Jan 09, 2013

ARM Cost model: Use the size of vector registers and widest vectorizable instruction to determine the max vectorization factor.

llvm-svn: 172010

b1791a75

Cost Model: Move the 'max unroll factor' variable to the TTI and add initial... · b696c36f
Nadav Rotem authored Jan 09, 2013
```
Cost Model: Move the 'max unroll factor' variable to the TTI and add initial Cost Model support on ARM.

llvm-svn: 171928
```
b696c36f

Jan 07, 2013

Fix the enumerator names for ShuffleKind to match tho coding standards, · 2109f47d
Chandler Carruth authored Jan 07, 2013
```
and make its comments doxygen comments.

llvm-svn: 171688
```
2109f47d
Make the popcnt support enums and methods have more clear names and · 50a36cd1
Chandler Carruth authored Jan 07, 2013
```
follow the conding conventions regarding enumerating a set of "kinds" of
things.

llvm-svn: 171687
```
50a36cd1
Move TargetTransformInfo to live under the Analysis library. This no · d3e73556
Chandler Carruth authored Jan 07, 2013
```
longer would violate any dependency layering and it is in fact an
analysis. =]

llvm-svn: 171686
```
d3e73556

Switch TargetTransformInfo from an immutable analysis pass that requires · 664e354d

Chandler Carruth authored Jan 07, 2013

a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.

The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.

The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.

The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.

The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.

The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.

The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.

The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.

Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.

Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.

Commits to update DragonEgg and Clang will be made presently.

llvm-svn: 171681

664e354d