Commits · 8543ba3e527241a4c51fbce999430a0e71d94a5d · Roger Ferrer / llvm-epi-0.8

Apr 12, 2013

SLPVectorizer: add support for vectorization of diamond shaped trees. We now... · 8543ba3e

Nadav Rotem authored Apr 12, 2013

SLPVectorizer: add support for vectorization of diamond shaped trees. We now perform a preliminary traversal of the graph to collect values with multiple users and check where the users came from. 

llvm-svn: 179414

8543ba3e

Add debug prints. · 4da0ab1d
Nadav Rotem authored Apr 12, 2013
```
llvm-svn: 179412
```
4da0ab1d

LoopVectorizer: integer division is not a reduction operation · f9cea17f

Arnold Schwaighofer authored Apr 12, 2013

Don't classify idiv/udiv as a reduction operation. Integer division is lossy.
For example : (1 / 2) * 4 != 4/2.

Example:

int a[] = { 2, 5, 2, 2}
int x = 80;

for()
  x /= a[i];

Scalar:
  x /= 2 // = 40
  x /= 5 // = 8
  x /= 2 // = 4
  x /= 2 // = 2

Vectorized:

 <80, 1> / <2,5> //= <40,0>
 <40, 0> / <2,2> //= <20,0>

 20*0 = 0

radar://13640654

llvm-svn: 179381

f9cea17f

Apr 11, 2013
- Rename the C function to create a SLPVectorizerPass to something sane and... · c86fdf12
  Benjamin Kramer authored Apr 11, 2013
```
Rename the C function to create a SLPVectorizerPass to something sane and expose it in the header file.

llvm-svn: 179272
```
  c86fdf12
Apr 10, 2013
- Make the SLP store-merger less paranoid about function calls. We check for... · 73dffa41
  Nadav Rotem authored Apr 10, 2013
```
Make the SLP store-merger less paranoid about function calls. We check for function calls when we check if it is safe to sink instructions.

llvm-svn: 179207
```
  73dffa41
- We require DataLayout for analyzing the size of stores. · 88dd5f7a
  Nadav Rotem authored Apr 10, 2013
```
llvm-svn: 179206
```
  88dd5f7a
Apr 09, 2013

Add support for bottom-up SLP vectorization infrastructure. · 2d9dec32

Nadav Rotem authored Apr 09, 2013

This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations.
The infrastructure has three potential users:

  1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]).

  2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute.

  3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization.

This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code:

void SAXPY(int *x, int *y, int a, int i) {
  x[i]   = a * x[i]   + y[i];
  x[i+1] = a * x[i+1] + y[i+1];
  x[i+2] = a * x[i+2] + y[i+2];
  x[i+3] = a * x[i+3] + y[i+3];
}

llvm-svn: 179117

2d9dec32

Apr 05, 2013

LoopVectorizer: Pass OperandValueKind information to the cost model · df6f67ed

Arnold Schwaighofer authored Apr 04, 2013

Pass down the fact that an operand is going to be a vector of constants.

This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86
back. It had degraded to scalar performance due to my pervious shift cost change
that made all shifts expensive on x86.

radar://13576547

llvm-svn: 178809

df6f67ed

Mar 14, 2013

LoopVectorize: Invert case when we use a vector cmp value to query select cost · c63cf3a0

Arnold Schwaighofer authored Mar 14, 2013

We generate a select with a vectorized condition argument when the condition is
NOT loop invariant. Not the other way around.

llvm-svn: 177098

c63cf3a0

Mar 10, 2013

BBVectorize: Fixup debugging statements · f610be9f

Hal Finkel authored Mar 10, 2013

After the recent data-structure improvements, a couple of debugging statements
were broken (printing pointer values).

llvm-svn: 176791

f610be9f

Mar 09, 2013

Remove a source of nondeterminism from the LoopVectorizer. · 6eda79f6

Benjamin Kramer authored Mar 09, 2013

This made us emit runtime checks in a random order. Hopefully bootstrap
miscompares will go away now.

llvm-svn: 176775

6eda79f6

LoopVectorizer: Ignore all dbg intrinisic · 8b3dc094
Arnold Schwaighofer authored Mar 09, 2013
```
Ignore all DbgIntriniscInfo instructions instead of just DbgValueInst.

llvm-svn: 176769
```
8b3dc094

LoopVectorizer: Ignore dbg.value instructions · 4090b61a

Arnold Schwaighofer authored Mar 09, 2013

We want vectorization to happen at -g. Ignore calls to the dbg.value intrinsic
and don't transfer them to the vectorized code.

radar://13378964

llvm-svn: 176768

4090b61a

Mar 08, 2013
- Insert the reduction start value into the first bypass block to preserve domination. · 37c2d65c
  Benjamin Kramer authored Mar 08, 2013
```
Fixes PR15344.

llvm-svn: 176701
```
  37c2d65c
Mar 02, 2013

PR14448 - prevent the loop vectorizer from vectorizing the same loop twice. · 739e37a0

Nadav Rotem authored Mar 02, 2013

The LoopVectorizer often runs multiple times on the same function due to inlining.
When this happens the loop vectorizer often vectorizes the same loops multiple times, increasing code size and adding unneeded branches.
With this patch, the vectorizer during vectorization puts metadata on scalar loops and marks them as 'already vectorized' so that it knows to ignore them when it sees them a second time.

PR14448.

llvm-svn: 176399

739e37a0

Mar 01, 2013
- LoopVectorize: Don't hang forever if a PHI only has skipped PHI uses. · 12f98fae
  Benjamin Kramer authored Mar 01, 2013
```
Fixes PR15384.

llvm-svn: 176366
```
  12f98fae
Feb 27, 2013

LoopVectorize: Vectorize math builtin calls. · dc145816

Benjamin Kramer authored Feb 27, 2013

This properly asks TargetLibraryInfo if a call is available and if it is, it
can be translated into the corresponding LLVM builtin. We don't vectorize sqrt()
yet because I'm not sure about the semantics for negative numbers. The other
intrinsic should be exact equivalents to the libm functions.

Differential Revision: http://llvm-reviews.chandlerc.com/D465

llvm-svn: 176188

dc145816

Feb 21, 2013

Allow GlobalValues to vectorize with AliasAnalysis · cf928cb5

Renato Golin authored Feb 21, 2013

Storing the load/store instructions with the values
and inspect them using Alias Analysis to make sure
they don't alias, since the GEP pointer operand doesn't
take the offset into account.

Trying hard to not add any extra cost to loads and stores
that don't overlap on global values, AA is *only* calculated
if all of the previous attempts failed.

Using biggest vector register size as the stride for the
vectorization access, as we're being conservative and
the cost model (which calculates the real vectorization
factor) is only run after the legalization phase.

We might re-think this relationship in the future, but
for now, I'd rather be safe than sorry.

llvm-svn: 175818

cf928cb5

Feb 17, 2013

BBVectorize: Fix an invalid reference bug · 76e65e45

Hal Finkel authored Feb 17, 2013

This fixes PR15289. This bug was introduced (recently) in r175215; collecting
all std::vector references for candidate pairs to delete at once is invalid
because subsequent lookups in the owning DenseMap could invalidate the
references.

bugpoint was able to reduce a useful test case. Unfortunately, because whether
or not this asserts depends on memory layout, this test case will sometimes
appear to produce valid output. Nevertheless, running under valgrind will
reveal the error.

llvm-svn: 175397

76e65e45

Feb 15, 2013

BBVectorize: Call a DAG and DAG instead of a tree · 89909397

Hal Finkel authored Feb 15, 2013

Several functions and variable names used the term 'tree' to refer
to what is actually a DAG. Correcting this mistake will, hopefully,
prevent confusion in the future.

No functionality change intended.

llvm-svn: 175278

89909397

BBVectorize: Cap the number of candidate pairs in each instruction group · 283f4f0e

Hal Finkel authored Feb 15, 2013

For some basic blocks, it is possible to generate many candidate pairs for
relatively few pairable instructions. When many (tens of thousands) of these pairs
are generated for a single instruction group, the time taken to generate and
rank the different vectorization plans can become quite large. As a result, we now
cap the number of candidate pairs within each instruction group. This is done by
closing out the group once the threshold is reached (set now at 3000 pairs).

Although this will limit the overall compile-time impact, this may not be the best
way to achieve this result. It might be better, for example, to prune excessive
candidate pairs after the fact the prevent the generation of short, but highly-connected
groups. We can experiment with this in the future.

This change reduces the overall compile-time slowdown of the csa.ll test case in
PR15222 to ~5x. If 5x is still considered too large, a lower limit can be
used as the default.

This represents a functionality change, but only for very large inputs
(thus, there is no regression test).

llvm-svn: 175251

283f4f0e

Feb 14, 2013

BBVectorize: Remove the remaining instances of std::multimap · e7a1ef42

Hal Finkel authored Feb 14, 2013

All instances of std::multimap have now been replaced by
DenseMap<K, std::vector<V> >, and this yields a speedup of 5% on the
csa.ll test case from PR15222.

No functionality change intended.

llvm-svn: 175216

e7a1ef42

BBVectorize: Don't store candidate pairs in a std::multimap · c3a4425c

Hal Finkel authored Feb 14, 2013

This is another commit on the road to removing std::multimap from
BBVectorize. This gives an ~1% speedup on the csa.ll test case
in PR15222.

No functionality change intended.

llvm-svn: 175215

c3a4425c

Feb 13, 2013
- LoopVectorize: Simplify code for clarity. · 0aa2ad61
  Benjamin Kramer authored Feb 13, 2013
```
No functionality change.

llvm-svn: 175076
```
  0aa2ad61
- Metadata for annotating loops as parallel. The first consumer for this · 0d23725a
  Pekka Jaaskelainen authored Feb 13, 2013
```
metadata is the loop vectorizer.

See the documentation update for more info.

llvm-svn: 175060
```
  0d23725a
Feb 12, 2013