Commits · c1c7a1309c039ec8539e3c851f332825d8855223 · Roger Ferrer / llvm-epi-0.8

Jul 14, 2013

Update Transforms tests to use CHECK-LABEL for easier debugging. No functionality change. · c1c7a130

Stephen Lin authored Jul 14, 2013

This update was done with the following bash script:

  find test/Transforms -name "*.ll" | \
  while read NAME; do
    echo "$NAME"
    if ! grep -q "^; *RUN: *llc" $NAME; then
      TEMP=`mktemp -t temp`
      cp $NAME $TEMP
      sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
      while read FUNC; do
        sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP
      done
      mv $TEMP $NAME
    fi
  done

llvm-svn: 186268

c1c7a130

Modify two Transforms tests to explicitly check for full function names in... · 2e105ff8

Stephen Lin authored Jul 14, 2013

Modify two Transforms tests to explicitly check for full function names in some cases, rather than just a common prefix. No functionality change.

(This is to avoid confusing a scripted mass update of these tests to use CHECK-LABEL)

llvm-svn: 186267

2e105ff8

Add newlines at end of test files, no functionality change · 6dd347b3
Stephen Lin authored Jul 13, 2013
```
llvm-svn: 186263
```
6dd347b3

Jul 13, 2013

LoopVectorizer: Disallow reductions whose header phi is used outside the loop · a92eeebd

Arnold Schwaighofer authored Jul 13, 2013

If an outside loop user of the reduction value uses the header phi node we
cannot just reduce the vectorized phi value in the vector code epilog because
we would loose VF-1 reductions.

lp:
  p = phi (0, lv)
  lv = lv + 1
  ...
  brcond , lp, outside

outside:
  usr = add 0, p

(Say the loop iterates two times, the value of p coming out of the loop is one).

We cannot just transform this to:

vlp:
  p = phi (<0,0>, lv)
  lv = lv + <1,1>
  ..
  brcond , lp, outside

outside:
  p_reduced = p[0] + [1];
  usr = add 0, p_reduced

(Because the original loop iterated two times the vectorized loop would iterate
one time, but p_reduced ends up being zero instead of one).

We would have to execute VF-1 iterations in the scalar remainder loop in such
cases. For now, just disable vectorization.

PR16522

llvm-svn: 186256

a92eeebd

Make the new vectorizer test immune to TTI · 960dee38
Andrew Trick authored Jul 13, 2013
```
llvm-svn: 186242
```
960dee38

LoopVectorize fix: LoopInfo must be valid when invoking utils like SCEVExpander. · 0ae8c94f

Andrew Trick authored Jul 13, 2013

In general, one should always complete CFG modifications first, update
CFG-based analyses, like Dominatores and LoopInfo, then generate
instruction sequences.

LoopVectorizer was creating a new loop, calling SCEVExpander to
generate checks, then updating LoopInfo. I just changed the order.

llvm-svn: 186241

0ae8c94f

Add a microoptimization for urem. · 7459be6d
Nick Lewycky authored Jul 13, 2013
```
llvm-svn: 186235
```
7459be6d
Fix logic error optimizing "icmp pred (urem X, Y), Y" where pred is signed. · 35aeea99
Nick Lewycky authored Jul 12, 2013
```
Fixes PR16605.

llvm-svn: 186229
```
35aeea99
Fix a crash in EvaluateInDifferentElementOrder where it would generate an · a3250f22
Joey Gouly authored Jul 12, 2013
```
undef vector of the wrong type.

LGTM'd by Nick Lewycky on IRC.

llvm-svn: 186224
```
a3250f22
LFTR improvement to avoid truncation. · a1e4118a
Andrew Trick authored Jul 12, 2013
```
This is a reimplemntation of the patch originally in r186107.

llvm-svn: 186215
```
a1e4118a

Jul 12, 2013

X86 cost model: Add cost for vectorized gather/scather · 6042a261
Arnold Schwaighofer authored Jul 12, 2013
```
radar://14351991

llvm-svn: 186189
```
6042a261

ARM cost model: Add cost for gather/scather · da2b3118

Arnold Schwaighofer authored Jul 12, 2013

Fixes a 35% degradation compared to unvectorized code in
MiBench/automotive-susan and an equally serious regression on a private
image processing benchmark.

radar://14351991

llvm-svn: 186188

da2b3118

Start using CHECK-LABEL in some tests. · 764d8d3d
Stephen Lin authored Jul 12, 2013
```
llvm-svn: 186163
```
764d8d3d

Revert "indvars: Improve LFTR by eliminating truncation when comparing · cf3715ca

Chandler Carruth authored Jul 12, 2013

against a constant."

This reverts commit r186107. It didn't handle wrapping arithmetic in the
loop correctly and thus caused the following C program to count from
0 to UINT64_MAX instead of from 0 to 255 as intended:

  #include <stdio.h>
  int main() {
    unsigned char first = 0, last = 255;
    do { printf("%d\n", first); } while (first++ != last);
  }

Full test case and instructions to reproduce with just the -indvars pass
sent to the original review thread rather than to r186107's commit.

llvm-svn: 186152

cf3715ca

SLPVectorizer: Sink and enable CSE for ExtractElements. · 89c41bf0
Nadav Rotem authored Jul 12, 2013
```
llvm-svn: 186145
```
89c41bf0

SLPVectorize: Replace the code that checks for vectorization candidates in... · fa3c2db2

Nadav Rotem authored Jul 12, 2013

SLPVectorize: Replace the code that checks for vectorization candidates in successor blocks with code that scans PHINodes.
Before we could vectorize PHINodes scanning successors was a good way of finding candidates. Now we can vectorize the phinodes which is simpler.

llvm-svn: 186139

fa3c2db2

Jul 11, 2013

indvars: Improve LFTR by eliminating truncation when comparing against a constant. · 3095993d

Andrew Trick authored Jul 11, 2013

Patch by Michele Scandale!

Adds a special handling of the case where, during the loop exit
condition rewriting, the exit value is a constant of bitwidth lower
than the type of the induction variable: instead of introducing a
trunc operation in order to match correctly the operand types, it
allows to convert the constant value to an equivalent constant,
depending on the initial value of the induction variable and the trip
count, in order have an equivalent comparison between the induction
variable and the new constant.

llvm-svn: 186107

3095993d

LoopVectorize: Vectorize all accesses in address space zero with unit stride · e97c71b8

Arnold Schwaighofer authored Jul 11, 2013

We can vectorize them because in the case where we wrap in the address space the
unvectorized code would have had to access a pointer value of zero which is
undefined behavior in address space zero according to the LLVM IR semantics.
(Thank you Duncan, for pointing this out to me).

Fixes PR16592.

llvm-svn: 186088

e97c71b8

TryToSimplifyUncondBranchFromEmptyBlock was checking that any common · e773c080

Duncan Sands authored Jul 11, 2013

predecessors of the two blocks it is attempting to merge supply the
same incoming values to any phi in the successor block.  This change
allows merging in the case where there is one or more incoming values
that are undef.  The undef values are rewritten to match the non-undef
value that flows from the other edge.  Patch by Mark Lacey.

llvm-svn: 186069

e773c080

Consolidate more lit tests. · 108ef760
Nadav Rotem authored Jul 11, 2013
```
llvm-svn: 186063
```
108ef760
Consolidate some of the lit tests. · e0a49499
Nadav Rotem authored Jul 11, 2013
```
llvm-svn: 186062
```
e0a49499
Consolidate some of the lit tests. · c6b5e249
Nadav Rotem authored Jul 11, 2013
```
llvm-svn: 186060
```
c6b5e249

Teach TailRecursionElimination to handle certain cases of nocapture escaping allocas. · b40db26e

Michael Gottesman authored Jul 11, 2013

Without the changes introduced into this patch, if TRE saw any allocas at all,
TRE would not perform TRE *or* mark callsites with the tail marker.

Because TRE runs after mem2reg, this inadequacy is not a death sentence. But
given a callsite A without escaping alloca argument, A may not be able to have
the tail marker placed on it due to a separate callsite B having a write-back
parameter passed in via an argument with the nocapture attribute.

Assume that B is the only other callsite besides A and B only has nocapture
escaping alloca arguments (*NOTE* B may have other arguments that are not passed
allocas). In this case not marking A with the tail marker is unnecessarily
conservative since:

  1. By assumption A has no escaping alloca arguments itself so it can not
     access the caller's stack via its arguments.

  2. Since all of B's escaping alloca arguments are passed as parameters with
     the nocapture attribute, we know that B does not stash said escaping
     allocas in a manner that outlives B itself and thus could be accessed
     indirectly by A.

With the changes introduced by this patch:

  1. If we see any escaping allocas passed as a capturing argument, we do
     nothing and bail early.

  2. If we do not see any escaping allocas passed as captured arguments but we
     do see escaping allocas passed as nocapture arguments:

       i. We do not perform TRE to avoid PR962 since the code generator produces
          significantly worse code for the dynamic allocas that would be created
          by the TRE algorithm.

       ii. If we do not return twice, mark call sites without escaping allocas
           with the tail marker. *NOTE* This excludes functions with escaping
           nocapture allocas.

  3. If we do not see any escaping allocas at all (whether captured or not):

       i. If we do not have usage of setjmp, mark all callsites with the tail
          marker.

       ii. If there are no dynamic/variable sized allocas in the function,
           attempt to perform TRE on all callsites in the function.

Based off of a patch by Nick Lewycky.

rdar://14324281.

llvm-svn: 186057

b40db26e

Jul 10, 2013
- InstSimplify: X >> X -> 0 · a80fed7e
  David Majnemer authored Jul 09, 2013
```
llvm-svn: 185973
```
  a80fed7e
Jul 09, 2013

Fix PR16571, which is a bug in the code that checks that all of the types in... · d7b574e5
Nadav Rotem authored Jul 09, 2013
```
Fix PR16571, which is a bug in the code that checks that all of the types in the bundle are uniform.

llvm-svn: 185970
```
d7b574e5

ValueTracking: Fix bugs in isKnownToBeAPowerOfTwo · a92b3c91

David Majnemer authored Jul 09, 2013

(add nsw x, (and x, y)) isn't a power of two if x is zero, it's zero
(add nsw x, (xor x, y)) isn't a power of two if y has bits set that aren't set in x

llvm-svn: 185954

a92b3c91

InstCombine: variations on 0xffffffff - x >= 4 · 72d76275

David Majnemer authored Jul 09, 2013

The following transforms are valid if -C is a power of 2:
(icmp ugt (xor X, C), ~C) -> (icmp ult X, C)
(icmp ult (xor X, C), -C) -> (icmp uge X, C)

These are nice, they get rid of the xor.

llvm-svn: 185915

72d76275

InstCombine: X & -C != -C -> X <= u ~C · 414d4e58
David Majnemer authored Jul 09, 2013
```
Tests were added in r185910 somehow.

llvm-svn: 185912
```
414d4e58
Commit r185909 was a misapplied patch, fix it · bafa537e
David Majnemer authored Jul 09, 2013
```
llvm-svn: 185910
```
bafa537e

InstCombine: add more transforms · f2a9a513

David Majnemer authored Jul 09, 2013

C1-X <u C2 -> (X|(C2-1)) == C1
C1-X >u C2 -> (X|C2) == C1
X-C1 <u C2 -> (X & -C2) == C1
X-C1 >u C2 -> (X & ~C2) == C1

llvm-svn: 185909

f2a9a513

Jul 08, 2013

InstCombine: Fold X-C1 <u 2 -> (X & -2) == C1 · fa90a0b3

David Majnemer authored Jul 08, 2013

Back in r179493 we determined that two transforms collided with each
other.  The fix back then was to reorder the transforms so that the
preferred transform would give it a try and then we would try the
secondary transform.  However, it was noted that the best approach would
canonicalize one transform into the other, removing the collision and
allowing us to optimize IR given to us in that form.

llvm-svn: 185808

fa90a0b3

[objc-arc] Committed test for r185770 as per dblaikie's suggestion. · 8c96263e
Michael Gottesman authored Jul 08, 2013
```
llvm-svn: 185782
```
8c96263e

Jul 07, 2013

Eliminate trivial redundant loads across nocapture+readonly calls to uncaptured · c0514629
Nick Lewycky authored Jul 07, 2013
```
pointer arguments.

llvm-svn: 185776
```
c0514629

SLPVectorizer: Implement DCE as part of vectorization. · 2041b742

Nadav Rotem authored Jul 07, 2013

This is a complete re-write if the bottom-up vectorization class.
Before this commit we scanned the instruction tree 3 times. First in search of merge points for the trees. Second, for estimating the cost. And finally for vectorization.
There was a lot of code duplication and adding the DCE exposed bugs. The new design is simpler and DCE was a part of the design.
In this implementation we build the tree once. After that we estimate the cost by scanning the different entries in the constructed tree (in any order). The vectorization phase also works on the built tree.

llvm-svn: 185774

2041b742

[objc-arc] Remove the alias analysis part of r185764. · 618df456
Michael Gottesman authored Jul 07, 2013
```
Upon further reflection, the alias analysis part of r185764 is not a safe
change.

llvm-svn: 185770
```
618df456

[objc-arc] Teach the ARC optimizer that objc_sync_enter/objc_sync_exit do not... · a72630d4

Michael Gottesman authored Jul 07, 2013

[objc-arc] Teach the ARC optimizer that objc_sync_enter/objc_sync_exit do not modify the ref count of an objc object and additionally are inert for modref purposes.

llvm-svn: 185769

a72630d4

Jul 06, 2013
- InstCombine: typo in or_icmp_eq_B_0_icmp_ult_A_B test · 69430609
  David Majnemer authored Jul 06, 2013
```
llvm-svn: 185737
```
  69430609
- Extend 'readonly' and 'readnone' to work on function arguments as well as · c2ec0725
  Nick Lewycky authored Jul 06, 2013
```
functions. Make the function attributes pass add it to known library functions
and when it can deduce it.

llvm-svn: 185735
```
  c2ec0725
- [TRE] Combined another test into basic.ll · 275b22e3
  Michael Gottesman authored Jul 05, 2013
```
llvm-svn: 185729
```
  275b22e3
Jul 05, 2013
- [TRE] Merged several tests into the the test basic.ll. · e283e195
  Michael Gottesman authored Jul 05, 2013
```
llvm-svn: 185723
```
  e283e195