Commits · a1e5e44eb35c1a3d4a64d34a7c73f3cea0c244d7 · Roger Ferrer / llvm-epi-0.8

Apr 15, 2013
- Rename the slp-vectorizer clang/llvm flags. No functionality change. · a1e5e44e
  Nadav Rotem authored Apr 15, 2013
```
llvm-svn: 179505
```
  a1e5e44e
- SLPVectorizer: Add support for vectorizing trees that start at compare instructions. · 5d393c41
  Nadav Rotem authored Apr 15, 2013
```
llvm-svn: 179504
```
  5d393c41
Apr 14, 2013

Reorders two transforms that collide with each other · 1fae1955

David Majnemer authored Apr 14, 2013

One performs: (X == 13 | X == 14) -> X-13 <u 2
The other: (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1

The problem is that there are certain values of C1 and C2 that
trigger both transforms but the first one blocks out the second,
this generates suboptimal code.

Reordering the transforms should be better in every case and
allows us to do interesting stuff like turn:
  %shr = lshr i32 %X, 4
  %and = and i32 %shr, 15
  %add = add i32 %and, -14
  %tobool = icmp ne i32 %add, 0

into:
  %and = and i32 %X, 240
  %tobool = icmp ne i32 %and, 224

llvm-svn: 179493

1fae1955

Miscellaneous cleanups for VecUtils.h · 7d62ea86
Benjamin Kramer authored Apr 14, 2013
```
llvm-svn: 179483
```
7d62ea86
SLP: Document the scalarization cost method. · 3403c115
Nadav Rotem authored Apr 14, 2013
```
llvm-svn: 179479
```
3403c115

SLPVectorizer: Add support for trees that don't start at binary operators, and... · 54b413d1

Nadav Rotem authored Apr 14, 2013

SLPVectorizer: Add support for trees that don't start at binary operators, and add the cost of extracting values from the roots of the tree.

llvm-svn: 179475

54b413d1

SLPVectorizer: add initial support for reduction variable vectorization. · 0b9cf856
Nadav Rotem authored Apr 14, 2013
```
llvm-svn: 179470
```
0b9cf856

Apr 13, 2013

GlobalDCE: Fix an oversight in my last commit that could lead to crashes. · adc1727c
Benjamin Kramer authored Apr 13, 2013
```
There is a Constant with non-constant operands: blockaddress.

llvm-svn: 179460
```
adc1727c

Fix a scalability issue with complex ConstantExprs. · 89ca4bc6

Benjamin Kramer authored Apr 13, 2013

This is basically the same fix in three different places. We use a set to avoid
walking the whole tree of a big ConstantExprs multiple times.

For example: (select cmp, (add big_expr 1), (add big_expr 2))
We don't want to visit big_expr twice here, it may consist of thousands of
nodes.

The testcase exercises this by creating an insanely large ConstantExprs out of
a loop. It's questionable if the optimizer should ever create those, but this
can be triggered with real C code. Fixes PR15714.

llvm-svn: 179458

89ca4bc6

Apr 12, 2013

InstCombine: Check the operand types before merging fcmp ord & fcmp ord. · e89c7050
Benjamin Kramer authored Apr 12, 2013
```
Fixes PR15737.

llvm-svn: 179417
```
e89c7050

SLPVectorizer: add support for vectorization of diamond shaped trees. We now... · 8543ba3e

Nadav Rotem authored Apr 12, 2013

SLPVectorizer: add support for vectorization of diamond shaped trees. We now perform a preliminary traversal of the graph to collect values with multiple users and check where the users came from. 

llvm-svn: 179414

8543ba3e

Add debug prints. · 4da0ab1d
Nadav Rotem authored Apr 12, 2013
```
llvm-svn: 179412
```
4da0ab1d

Simplify (A & ~B) in icmp if A is a power of 2 · 1a08accb

David Majnemer authored Apr 12, 2013

The transform will execute like so:
(A & ~B) == 0 --> (A & B) != 0
(A & ~B) != 0 --> (A & B) == 0

llvm-svn: 179386

1a08accb

LoopVectorizer: integer division is not a reduction operation · f9cea17f

Arnold Schwaighofer authored Apr 12, 2013

Don't classify idiv/udiv as a reduction operation. Integer division is lossy.
For example : (1 / 2) * 4 != 4/2.

Example:

int a[] = { 2, 5, 2, 2}
int x = 80;

for()
  x /= a[i];

Scalar:
  x /= 2 // = 40
  x /= 5 // = 8
  x /= 2 // = 4
  x /= 2 // = 2

Vectorized:

 <80, 1> / <2,5> //= <40,0>
 <40, 0> / <2,2> //= <20,0>

 20*0 = 0

radar://13640654

llvm-svn: 179381

f9cea17f

Apr 11, 2013

Optimize icmp involving addition better · b81cd63c

David Majnemer authored Apr 11, 2013

Allows LLVM to optimize sequences like the following:

%add = add nsw i32 %x, 1
%cmp = icmp sgt i32 %add, %y

into:

%cmp = icmp sge i32 %x, %y

as well as:

%add1 = add nsw i32 %x, 20
%add2 = add nsw i32 %y, 57
%cmp = icmp sge i32 %add1, %add2

into:

%add = add nsw i32 %y, 37
%cmp = icmp sle i32 %cmp, %x

llvm-svn: 179316

b81cd63c

Fix for wrong instcombine on vector insert/extract · a95f8749

Benjamin Kramer authored Apr 11, 2013

When trying to collapse sequences of insertelement/extractelement
instructions into single shuffle instructions, there is one specific
case where the Instruction Combiner wrongly updates the resulting
Mask of shuffle indexes.

The problem is in function CollectShuffleElments.

If we have a sequence of insert/extract element instructions
like the one below:

  %tmp1 = extractelement <4 x float> %LHS, i32 0
  %tmp2 = insertelement <4 x float> %RHS, float %tmp1, i32 1
  %tmp3 = extractelement <4 x float> %RHS, i32 2
  %tmp4 = insertelement <4 x float> %tmp2, float %tmp3, i32 3

Where:
  . %RHS will have a mask of [4,5,6,7]
  . %LHS will have a mask of [0,1,2,3]

The Mask of shuffle indexes is wrongly computed to [4,1,6,7]
instead of [4,0,6,7].
When analyzing %tmp2 in order to compute the Mask for the
resulting shuffle instruction, the algorithm forgets to update
the mask index at position 1 with the index associated to the
element extracted from %LHS by instruction %tmp1.

Patch by Andrea DiBiagio!

llvm-svn: 179291

a95f8749

[ASan] Allow disabling init-order checks for globals by source file name. · a28f36c2
Alexey Samsonov authored Apr 11, 2013
```
llvm-svn: 179280
```
a28f36c2
Rename the C function to create a SLPVectorizerPass to something sane and... · c86fdf12
Benjamin Kramer authored Apr 11, 2013
```
Rename the C function to create a SLPVectorizerPass to something sane and expose it in the header file.

llvm-svn: 179272
```
c86fdf12

Apr 10, 2013
- Make the SLP store-merger less paranoid about function calls. We check for... · 73dffa41
  Nadav Rotem authored Apr 10, 2013
```
Make the SLP store-merger less paranoid about function calls. We check for function calls when we check if it is safe to sink instructions.

llvm-svn: 179207
```
  73dffa41
- We require DataLayout for analyzing the size of stores. · 88dd5f7a
  Nadav Rotem authored Apr 10, 2013
```
llvm-svn: 179206
```
  88dd5f7a
- Change CloneFunctionInto to always clone Argument attributes induvidually, · 81259294
  Joey Gouly authored Apr 10, 2013
```
rather than checking if the source and destination have the same number of
arguments and copying the attributes over directly.

llvm-svn: 179169
```
  81259294
- Fix some comment typos. · 798a7709
  Bob Wilson authored Apr 09, 2013
```
llvm-svn: 179132
```
  798a7709
Apr 09, 2013

Add support for bottom-up SLP vectorization infrastructure. · 2d9dec32

Nadav Rotem authored Apr 09, 2013

This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations.
The infrastructure has three potential users:

  1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]).

  2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute.

  3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization.

This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code:

void SAXPY(int *x, int *y, int a, int i) {
  x[i]   = a * x[i]   + y[i];
  x[i+1] = a * x[i+1] + y[i+1];
  x[i+2] = a * x[i+2] + y[i+2];
  x[i+3] = a * x[i+3] + y[i+3];
}

llvm-svn: 179117

2d9dec32

Redo the fix Benjamin Kramer committed in r178793 about iterator invalidation in Reassociate. · 331f01dc

Shuxin Yang authored Apr 08, 2013

I brazenly think this change is slightly simpler than r178793 because: 
  - no "state" in functor
  - "OpndPtrs[i]" looks simpler than "&Opnds[OpndIndices[i]]" 

  While I can reproduce the probelm in Valgrind, it is rather difficult to come up
a standalone testing case. The reason is that when an iterator is invalidated,
the stale invalidated elements are not yet clobbered by nonsense data, so the
optimizer can still proceed successfully. 

  Thank Benjamin for fixing this bug and generously providing the test case.

llvm-svn: 179062

331f01dc

Apr 07, 2013

Fix PR15674 (and PR15603): a SROA think-o. · 0e8a52d1

Chandler Carruth authored Apr 07, 2013

The fix for PR14972 in r177055 introduced a real think-o in the *store*
side, likely because I was much more focused on the load side. While we
can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily
widen a value to be stored, as that changes the width of memory access!
Lock down the code path in the store rewriting which would do this to
only handle the intended circumstance.

All of the existing tests continue to pass, and I've added a test from
the PR.

llvm-svn: 178974

0e8a52d1

Apr 06, 2013

Removed trailing whitespace. · 7924997c
Michael Gottesman authored Apr 05, 2013
```
llvm-svn: 178932
```
7924997c

An objc_retain can serve as a use for a different pointer. · 31ba23aa

Michael Gottesman authored Apr 05, 2013

This is the counterpart to commit r160637, except it performs the action
in the bottomup portion of the data flow analysis.

llvm-svn: 178922

31ba23aa

Properly model precise lifetime when given an incomplete dataflow sequence. · 1d8d2577

Michael Gottesman authored Apr 05, 2013

The normal dataflow sequence in the ARC optimizer consists of the following
states:

    Retain -> CanRelease -> Use -> Release

The optimizer before this patch stored the uses that determine the lifetime of
the retainable object pointer when it bottom up hits a retain or when top down
it hits a release. This is correct for an imprecise lifetime scenario since what
we are trying to do is remove retains/releases while making sure that no
``CanRelease'' (which is usually a call) deallocates the given pointer before we
get to the ``Use'' (since that would cause a segfault).

If we are considering the precise lifetime scenario though, this is not
correct. In such a situation, we *DO* care about the previous sequence, but
additionally, we wish to track the uses resulting from the following incomplete
sequences:

  Retain -> CanRelease -> Release   (TopDown)
  Retain <- Use <- Release          (BottomUp)

*NOTE* This patch looks large but the most of it consists of updating
test cases. Additionally this fix exposed an additional bug. I removed
the test case that expressed said bug and will recommit it with the fix
in a little bit.

llvm-svn: 178921

1d8d2577

Apr 05, 2013

Tidy up a bit. No functional change. · bdbd7346
Jim Grosbach authored Apr 05, 2013
```
llvm-svn: 178915
```
bdbd7346

Disable the optimization about promoting vector-element-access with symbolic index. · 95adf525

Shuxin Yang authored Apr 05, 2013

This optimization is unstable at this moment; it 
  1) block us on a very important application
  2) PR15200
  3) test6 and test7 in test/Transforms/ScalarRepl/dynamic-vector-gep.ll
     (the CHECK command compare the output against wrong result)

   I personally believe this optimization should not have any impact on the
autovectorized code, as auto-vectorizer is supposed to put gather/scatter
in a "right" way.  Although in theory downstream optimizaters might reveal 
some gather/scatter optimization opportunities, the chance is quite slim.

   For the hand-crafted vectorizing code, in term of redundancy elimination,
load-CSE, copy-propagation and DSE can collectively achieve the same result,
but in much simpler way. On the other hand, these optimizers are able to 
improve the code in a incremental way; in contrast, SROA is sort of all-or-none
approach. However, SROA might slighly win in stack size, as it tries to figure 
out a stretch of memory tightenly cover the area accessed by the dynamic index.

 rdar://13174884
 PR15200

llvm-svn: 178912

95adf525

Added two debug logging messages to VisitInstructionsTopDown to match VisitInstructionsBottomUp. · bab49e97
Michael Gottesman authored Apr 05, 2013
```
llvm-svn: 178895
```
bab49e97
Cleaned up whitespace and made debug logging less verbose. · 89279f83
Michael Gottesman authored Apr 05, 2013
```
llvm-svn: 178893
```
89279f83

LoopVectorizer: Pass OperandValueKind information to the cost model · df6f67ed

Arnold Schwaighofer authored Apr 04, 2013

Pass down the fact that an operand is going to be a vector of constants.

This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86
back. It had degraded to scalar performance due to my pervious shift cost change
that made all shifts expensive on x86.

radar://13576547

llvm-svn: 178809

df6f67ed

Apr 04, 2013

Reassociate: Avoid iterator invalidation. · dd67654a

Benjamin Kramer authored Apr 04, 2013

OpndPtrs stored pointers into the Opnd vector that became invalid when the
vector grows. Store indices instead. Sadly I only have a large testcase that
only triggers under valgrind, so I didn't include it.

llvm-svn: 178793

dd67654a

Refactored out the helper method FindPredecessorAutoreleaseWithSafePath from... · 21a4ed32

Michael Gottesman authored Apr 03, 2013

Refactored out the helper method FindPredecessorAutoreleaseWithSafePath from ObjCARCOpt::OptimizeReturns.

Now ObjCARCOpt::OptimizeReturns is easy to read and reason about.

llvm-svn: 178715

21a4ed32

Refactored out the helper function FindPredecessorRetainWithSafePath from... · 6908db14
Michael Gottesman authored Apr 03, 2013
```
Refactored out the helper function FindPredecessorRetainWithSafePath from ObjCARCOpt::OptimizeReturns.

llvm-svn: 178714
```
6908db14

Small cleanups. · c2d5bf5c

Michael Gottesman authored Apr 03, 2013

Cleaned up trailing whitespace and added extra slashes in front of a
function level comment so that it follow the convention of having 3
slashes.

llvm-svn: 178712

c2d5bf5c

Refactored out a part of ObjCARCOpt::OptimizeReturns into its own method... · 54dc7fde
Michael Gottesman authored Apr 03, 2013
```
Refactored out a part of ObjCARCOpt::OptimizeReturns into its own method HasSafePathToPredecessorCall.

llvm-svn: 178710
```
54dc7fde
Removed an old comment. · 0a1748bb
Michael Gottesman authored Apr 03, 2013
```
llvm-svn: 178709
```
0a1748bb

Clean up arc annotations by moving the top/bottom BB annotations into... · 43e7e00a

Michael Gottesman authored Apr 03, 2013

Clean up arc annotations by moving the top/bottom BB annotations into conditional macros that no-op in Release mode instead of #ifdef sections of the code.

This is to follow the example of the DEBUG macro.

llvm-svn: 178705

43e7e00a