Commits · 3789171972381c2071ab094936ed50b88c448e00 · Roger Ferrer / llvm-epi-0.8

Apr 16, 2013

We are not able to bitcast a pointer to an integral value. · 37891719

Bill Wendling authored Apr 15, 2013

Two return types are not equivalent if one is a pointer and the other is an
integral. This is because we cannot bitcast a pointer to an integral value.
PR15185

llvm-svn: 179569

37891719

SLPVectorizer: Make it a function pass and add code for hoisting the... · b9116e69

Nadav Rotem authored Apr 15, 2013

SLPVectorizer: Make it a function pass and add code for hoisting the vector-gather sequence out of loops.

llvm-svn: 179562

b9116e69

Apr 15, 2013
- Revert "Recommit r179497 after fixing uninitialized variable." until · 13637e90
  Eric Christopher authored Apr 15, 2013
```
I can fix the testcases here:

http://lab.llvm.org:8011/builders/clang-native-arm-cortex-a9/builds/6952

This reverts commit r179512 due to testcases specifying triples
that they didn't actually mean and causing failures on other platforms.

llvm-svn: 179513
```
  13637e90
- Recommit r179497 after fixing uninitialized variable. · fc2beaa1
  Eric Christopher authored Apr 15, 2013
```
llvm-svn: 179512
```
  fc2beaa1
- SLPVectorizer: Add support for vectorizing trees that start at compare instructions. · 5d393c41
  Nadav Rotem authored Apr 15, 2013
```
llvm-svn: 179504
```
  5d393c41
- Revert "Remove some unused triple and data layout." · 1f140317
  Eric Christopher authored Apr 14, 2013
```
This reverts commit r179497 and the accompanying commit as it broke random platforms that aren't osx.

llvm-svn: 179499
```
  1f140317
- Remove some unused triple and data layout. · 4eebd14a
  Eric Christopher authored Apr 14, 2013
```
llvm-svn: 179498
```
  4eebd14a
Apr 14, 2013

Reorders two transforms that collide with each other · 1fae1955

David Majnemer authored Apr 14, 2013

One performs: (X == 13 | X == 14) -> X-13 <u 2
The other: (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1

The problem is that there are certain values of C1 and C2 that
trigger both transforms but the first one blocks out the second,
this generates suboptimal code.

Reordering the transforms should be better in every case and
allows us to do interesting stuff like turn:
  %shr = lshr i32 %X, 4
  %and = and i32 %shr, 15
  %add = add i32 %and, -14
  %tobool = icmp ne i32 %add, 0

into:
  %and = and i32 %X, 240
  %tobool = icmp ne i32 %and, 224

llvm-svn: 179493

1fae1955

Make the command line triple match the module triple. · 6ebddae1
Nadav Rotem authored Apr 14, 2013
```
llvm-svn: 179492
```
6ebddae1
Remove unused function attributes. · 029208ce
Nadav Rotem authored Apr 14, 2013
```
llvm-svn: 179476
```
029208ce

SLPVectorizer: Add support for trees that don't start at binary operators, and... · 54b413d1

Nadav Rotem authored Apr 14, 2013

SLPVectorizer: Add support for trees that don't start at binary operators, and add the cost of extracting values from the roots of the tree.

llvm-svn: 179475

54b413d1

SLPVectorizer: add initial support for reduction variable vectorization. · 0b9cf856
Nadav Rotem authored Apr 14, 2013
```
llvm-svn: 179470
```
0b9cf856

Apr 13, 2013

GlobalDCE: Fix an oversight in my last commit that could lead to crashes. · adc1727c
Benjamin Kramer authored Apr 13, 2013
```
There is a Constant with non-constant operands: blockaddress.

llvm-svn: 179460
```
adc1727c

Fix a scalability issue with complex ConstantExprs. · 89ca4bc6

Benjamin Kramer authored Apr 13, 2013

This is basically the same fix in three different places. We use a set to avoid
walking the whole tree of a big ConstantExprs multiple times.

For example: (select cmp, (add big_expr 1), (add big_expr 2))
We don't want to visit big_expr twice here, it may consist of thousands of
nodes.

The testcase exercises this by creating an insanely large ConstantExprs out of
a loop. It's questionable if the optimizer should ever create those, but this
can be triggered with real C code. Fixes PR15714.

llvm-svn: 179458

89ca4bc6

Apr 12, 2013

InstCombine: Check the operand types before merging fcmp ord & fcmp ord. · e89c7050
Benjamin Kramer authored Apr 12, 2013
```
Fixes PR15737.

llvm-svn: 179417
```
e89c7050

SLPVectorizer: add support for vectorization of diamond shaped trees. We now... · 8543ba3e

Nadav Rotem authored Apr 12, 2013

SLPVectorizer: add support for vectorization of diamond shaped trees. We now perform a preliminary traversal of the graph to collect values with multiple users and check where the users came from. 

llvm-svn: 179414

8543ba3e

CostModel: increase the default cost of supported floating point operations... · 87a0af6e

Nadav Rotem authored Apr 12, 2013

CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles.

llvm-svn: 179413

87a0af6e

Simplify (A & ~B) in icmp if A is a power of 2 · 1a08accb

David Majnemer authored Apr 12, 2013

The transform will execute like so:
(A & ~B) == 0 --> (A & B) != 0
(A & ~B) != 0 --> (A & B) == 0

llvm-svn: 179386

1a08accb

LoopVectorizer: integer division is not a reduction operation · f9cea17f

Arnold Schwaighofer authored Apr 12, 2013

Don't classify idiv/udiv as a reduction operation. Integer division is lossy.
For example : (1 / 2) * 4 != 4/2.

Example:

int a[] = { 2, 5, 2, 2}
int x = 80;

for()
  x /= a[i];

Scalar:
  x /= 2 // = 40
  x /= 5 // = 8
  x /= 2 // = 4
  x /= 2 // = 2

Vectorized:

 <80, 1> / <2,5> //= <40,0>
 <40, 0> / <2,2> //= <20,0>

 20*0 = 0

radar://13640654

llvm-svn: 179381

f9cea17f

Apr 11, 2013

Optimize icmp involving addition better · b81cd63c

David Majnemer authored Apr 11, 2013

Allows LLVM to optimize sequences like the following:

%add = add nsw i32 %x, 1
%cmp = icmp sgt i32 %add, %y

into:

%cmp = icmp sge i32 %x, %y

as well as:

%add1 = add nsw i32 %x, 20
%add2 = add nsw i32 %y, 57
%cmp = icmp sge i32 %add1, %add2

into:

%add = add nsw i32 %y, 37
%cmp = icmp sle i32 %cmp, %x

llvm-svn: 179316

b81cd63c

Fix for wrong instcombine on vector insert/extract · a95f8749

Benjamin Kramer authored Apr 11, 2013

When trying to collapse sequences of insertelement/extractelement
instructions into single shuffle instructions, there is one specific
case where the Instruction Combiner wrongly updates the resulting
Mask of shuffle indexes.

The problem is in function CollectShuffleElments.

If we have a sequence of insert/extract element instructions
like the one below:

  %tmp1 = extractelement <4 x float> %LHS, i32 0
  %tmp2 = insertelement <4 x float> %RHS, float %tmp1, i32 1
  %tmp3 = extractelement <4 x float> %RHS, i32 2
  %tmp4 = insertelement <4 x float> %tmp2, float %tmp3, i32 3

Where:
  . %RHS will have a mask of [4,5,6,7]
  . %LHS will have a mask of [0,1,2,3]

The Mask of shuffle indexes is wrongly computed to [4,1,6,7]
instead of [4,0,6,7].
When analyzing %tmp2 in order to compute the Mask for the
resulting shuffle instruction, the algorithm forgets to update
the mask index at position 1 with the index associated to the
element extracted from %LHS by instruction %tmp1.

Patch by Andrea DiBiagio!

llvm-svn: 179291

a95f8749

Add missing colons to check lines. · b50682e1
Benjamin Kramer authored Apr 11, 2013
```
llvm-svn: 179277
```
b50682e1
FileCheckize a bunch of tests. · 3960c1cd
Benjamin Kramer authored Apr 11, 2013
```
llvm-svn: 179276
```
3960c1cd

Apr 10, 2013

Make the SLP store-merger less paranoid about function calls. We check for... · 73dffa41

Nadav Rotem authored Apr 10, 2013

Make the SLP store-merger less paranoid about function calls. We check for function calls when we check if it is safe to sink instructions.

llvm-svn: 179207

73dffa41

Apr 09, 2013

Add support for bottom-up SLP vectorization infrastructure. · 2d9dec32

Nadav Rotem authored Apr 09, 2013

This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations.
The infrastructure has three potential users:

  1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]).

  2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute.

  3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization.

This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code:

void SAXPY(int *x, int *y, int a, int i) {
  x[i]   = a * x[i]   + y[i];
  x[i+1] = a * x[i+1] + y[i+1];
  x[i+2] = a * x[i+2] + y[i+2];
  x[i+3] = a * x[i+3] + y[i+3];
}

llvm-svn: 179117

2d9dec32

Revert r176408 and r176407 to address PR15540. · abcc64fd
Nadav Rotem authored Apr 09, 2013
```
llvm-svn: 179111
```
abcc64fd
Converted 8x tests of SimplifyCFG to use FileCheck instead of grep. · ccc93e72
Michael Gottesman authored Apr 09, 2013
```
llvm-svn: 179087
```
ccc93e72
Revert 179071 because it is not the right way to support non standard new/new[] operators. · 7b7585d1
Nadav Rotem authored Apr 09, 2013
```
llvm-svn: 179084
```
7b7585d1
c++ new operators are not malloc-like functions because they do not return uninitialized memory. · 9dd90ac5
Nadav Rotem authored Apr 08, 2013
```
Users may overide new-operators and implement any function that they like.

llvm-svn: 179071
```
9dd90ac5

Apr 07, 2013

Fix PR15674 (and PR15603): a SROA think-o. · 0e8a52d1

Chandler Carruth authored Apr 07, 2013

The fix for PR14972 in r177055 introduced a real think-o in the *store*
side, likely because I was much more focused on the load side. While we
can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily
widen a value to be stored, as that changes the width of memory access!
Lock down the code path in the store rewriting which would do this to
only handle the intended circumstance.

All of the existing tests continue to pass, and I've added a test from
the PR.

llvm-svn: 178974

0e8a52d1

Apr 06, 2013

An objc_retain can serve as a use for a different pointer. · 31ba23aa

Michael Gottesman authored Apr 05, 2013

This is the counterpart to commit r160637, except it performs the action
in the bottomup portion of the data flow analysis.

llvm-svn: 178922

31ba23aa

Properly model precise lifetime when given an incomplete dataflow sequence. · 1d8d2577

Michael Gottesman authored Apr 05, 2013

The normal dataflow sequence in the ARC optimizer consists of the following
states:

    Retain -> CanRelease -> Use -> Release

The optimizer before this patch stored the uses that determine the lifetime of
the retainable object pointer when it bottom up hits a retain or when top down
it hits a release. This is correct for an imprecise lifetime scenario since what
we are trying to do is remove retains/releases while making sure that no
``CanRelease'' (which is usually a call) deallocates the given pointer before we
get to the ``Use'' (since that would cause a segfault).

If we are considering the precise lifetime scenario though, this is not
correct. In such a situation, we *DO* care about the previous sequence, but
additionally, we wish to track the uses resulting from the following incomplete
sequences:

  Retain -> CanRelease -> Release   (TopDown)
  Retain <- Use <- Release          (BottomUp)

*NOTE* This patch looks large but the most of it consists of updating
test cases. Additionally this fix exposed an additional bug. I removed
the test case that expressed said bug and will recommit it with the fix
in a little bit.

llvm-svn: 178921

1d8d2577

Apr 05, 2013

Disable the optimization about promoting vector-element-access with symbolic index. · 95adf525

Shuxin Yang authored Apr 05, 2013

This optimization is unstable at this moment; it 
  1) block us on a very important application
  2) PR15200
  3) test6 and test7 in test/Transforms/ScalarRepl/dynamic-vector-gep.ll
     (the CHECK command compare the output against wrong result)

   I personally believe this optimization should not have any impact on the
autovectorized code, as auto-vectorizer is supposed to put gather/scatter
in a "right" way.  Although in theory downstream optimizaters might reveal 
some gather/scatter optimization opportunities, the chance is quite slim.

   For the hand-crafted vectorizing code, in term of redundancy elimination,
load-CSE, copy-propagation and DSE can collectively achieve the same result,
but in much simpler way. On the other hand, these optimizers are able to 
improve the code in a incremental way; in contrast, SROA is sort of all-or-none
approach. However, SROA might slighly win in stack size, as it tries to figure 
out a stretch of memory tightenly cover the area accessed by the dynamic index.

 rdar://13174884
 PR15200

llvm-svn: 178912

95adf525

LoopVectorizer: Pass OperandValueKind information to the cost model · df6f67ed

Arnold Schwaighofer authored Apr 04, 2013

Pass down the fact that an operand is going to be a vector of constants.

This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86
back. It had degraded to scalar performance due to my pervious shift cost change
that made all shifts expensive on x86.

radar://13576547

llvm-svn: 178809

df6f67ed

Apr 03, 2013

Remove an optimization where we were changing an objc_autorelease into an... · b8c88365

Michael Gottesman authored Apr 03, 2013

Remove an optimization where we were changing an objc_autorelease into an objc_autoreleaseReturnValue.

The semantics of ARC implies that a pointer passed into an objc_autorelease
must live until some point (potentially down the stack) where an
autorelease pool is popped. On the other hand, an
objc_autoreleaseReturnValue just signifies that the object must live
until the end of the given function at least.

Thus objc_autorelease is stronger than objc_autoreleaseReturnValue in
terms of the semantics of ARC* implying that performing the given
strength reduction without any knowledge of how this relates to
the autorelease pool pop that is further up the stack violates the
semantics of ARC.

*Even though objc_autoreleaseReturnValue if you know that no RV
optimization will occur is more computationally expensive.

llvm-svn: 178612

b8c88365

Apr 02, 2013

Use a worklist to avoid a sneaky iterator invalidation. · 88d06c3b

Bill Wendling authored Apr 02, 2013

The iterator could be invalidated when it's recursively deleting a whole bunch
of constant expressions in a constant initializer.

Note: This was only reproducible if `opt' was run on a `.bc' file. If `opt' was
run on a `.ll' file, it wouldn't crash. This is why the test first pushes the
`.ll' file through `llvm-as' before feeding it to `opt'.

PR15440

llvm-svn: 178531

88d06c3b

Apr 01, 2013
- Correct assertion condition · 6662fd0f
  Shuxin Yang authored Apr 01, 2013
```
llvm-svn: 178484
```
  6662fd0f
- X86TTI: Add accurate costs for itofp operations, based on the actual instruction counts. · 52ceb443
  Benjamin Kramer authored Apr 01, 2013
```
llvm-svn: 178459
```
  52ceb443
Mar 30, 2013

Implement XOR reassociation. It is based on following rules: · 7b0c94e2

Shuxin Yang authored Mar 30, 2013

  rule 1: (x | c1) ^ c2 => (x & ~c1) ^ (c1^c2),
     only useful when c1=c2
  rule 2: (x & c1) ^ (x & c2) = (x & (c1^c2))
  rule 3: (x | c1) ^ (x | c2) = (x & c3) ^ c3 where c3 = c1 ^ c2
  rule 4: (x | c1) ^ (x & c2) => (x & c3) ^ c1, where c3 = ~c1 ^ c2

 It reduces an application's size (in terms of # of instructions) by 8.9%.
 Reviwed by Pete Cooper. Thanks a lot!

 rdar://13212115  

llvm-svn: 178409

7b0c94e2

Mar 29, 2013

Updated test0 of retain-not-declared.ll to reflect the fact that... · 94128300

Michael Gottesman authored Mar 29, 2013

Updated test0 of retain-not-declared.ll to reflect the fact that objc-arc-expand runs before objc-arc/objc-arc-contract.

Specifically, objc-arc-expand will make sure that the
objc_retainAutoreleasedReturnValue, objc_autoreleaseReturnValue, and ret
will all have %call as an argument.

llvm-svn: 178382

94128300