Commits · cdfb48d2fe40858507d7a136c6e344f1da4dea68 · Roger Ferrer / llvm-epi-0.8

May 11, 2013

SLPVectorizer: Add support for trees with external users. · cdfb48d2

Nadav Rotem authored May 10, 2013

For example:
bar() {
  int a = A[i];
  int b = A[i+1];
  B[i] = a;
  B[i+1] = b;
  foo(a);  <--- a is used outside the vectorized expression.
}

llvm-svn: 181648

cdfb48d2

Add a debug print · 0686e5cb
Nadav Rotem authored May 10, 2013
```
llvm-svn: 181647
```
0686e5cb

May 10, 2013

InstCombine: Don't claim to be able to evaluate any shl in a zexted type. · 14e915f7

Benjamin Kramer authored May 10, 2013

The shift amount may be larger than the type leading to undefined behavior.
Limit the transform to constant shift amounts. While there update the bits to
clear in the result which may enable additional optimizations.

PR15959.

llvm-svn: 181604

14e915f7

InstCombine: Verify the type before transforming uitofp into select. · a6645e8b
Benjamin Kramer authored May 10, 2013
```
PR15952.

llvm-svn: 181586
```
a6645e8b

May 09, 2013

Fix a documentation warning: \bried -> \brief · 9bf66a5f
Dmitri Gribenko authored May 09, 2013
```
llvm-svn: 181551
```
9bf66a5f

[GVN] Split critical-edge on the fly, instead of postpone edge-splitting to next · 1d8d7e4d

Shuxin Yang authored May 09, 2013

  iteration.
  
  This on step toward non-iterative GVN. My local hack suggests that getting rid
of iteration will speedup GVN by 30%+ on a medium sized input (2k LOC, C++).
I cannot explain why not 2x or more at this moment.

llvm-svn: 181532

1d8d7e4d

Don't replace an alias in llvm.used with its target. · 00752167

Rafael Espindola authored May 09, 2013

When we replace an internal alias with its target, be careful not to
replace the entry in llvm.used (and llvm.compiler_used).

llvm-svn: 181524

00752167

InstCombine: Don't just copy known bits from the first operand of an srem. · 21b972ae

Benjamin Kramer authored May 09, 2013

That's obviously wrong. Conservatively restrict it to the sign bit, which
matches the original intention of this analysis. Fixes PR15940.

llvm-svn: 181518

21b972ae

LoopVectorizer: Don't assert on the absence of induction variables · 2e8c69cf

Arnold Schwaighofer authored May 09, 2013

A computable loop exit count does not imply the presence of an induction
variable. Scalar evolution can return a value for an infinite loop.

Fixes PR15926.

llvm-svn: 181495

2e8c69cf

May 08, 2013

Add DebugIR pass -- emits IR file and replace source lines with IR lines in MD · 3c5bed16

Daniel Malea authored May 08, 2013

- requires existing debug information to be present
- fixes up file name and line number information in metadata
- emits a "<orig_filename>-debug.ll" succinct IR file (without !dbg metadata
  or debug intrinsics) that can be read by a debugger
- initialize pass in opt tool to enable the "-debug-ir" flag
- lit tests to follow

llvm-svn: 181467

3c5bed16

Fix a bug in codegenprep where it was losing track of values OptimizeMemoryInst · 5fb1963f
Nick Lewycky authored May 08, 2013
```
by switching to a ValueMap. Patch by Andrea DiBiagio!

llvm-svn: 181397
```
5fb1963f

May 07, 2013

LoopVectorizer: Improve reduction variable identification · 3610139a

Arnold Schwaighofer authored May 07, 2013

The two nested loops were confusing and also conservative in identifying
reduction variables. This patch replaces them by a worklist based approach.

llvm-svn: 181369

3610139a

LoopVectorize: getConsecutiveVector must respect signed arithmetic · e78b76fb

Arnold Schwaighofer authored May 07, 2013

We were passing an i32 to ConstantInt::get where an i64 was needed and we must
also pass the sign if we pass negatives numbers. The start index passed to
getConsecutiveVector must also be signed.

Should fix PR15882.

llvm-svn: 181286

e78b76fb

May 06, 2013

InstCombine: (X ^ signbit) + C -> X + (signbit ^ C) · 70f286d9
David Majnemer authored May 06, 2013
```
llvm-svn: 181249
```
70f286d9

Rotate multi-exit loops even if the latch was simplified. · 9c72b071

Andrew Trick authored May 06, 2013

Test case by Michele Scandale!

Fixes PR10293: Load not hoisted out of loop with multiple exits.

There are few regressions with this patch, now tracked by
rdar:13817079, and a roughly equal number of improvements. The
regressions are almost certainly back luck because LoopRotate has very
little idea of whether rotation is profitable. Doing better requires a
more comprehensive solution.

This checkin is a quick fix that lacks generality (PR10293 has
a counter-example). But it trivially fixes the case in PR10293 without
interfering with other cases, and it does satify the criteria that
LoopRotate is a loop canonicalization pass that should avoid
heuristics and special cases.

I can think of two approaches that would probably be better in
the long run. Ultimately they may both make sense.

(1) LoopRotate should check that the current header would make a good
loop guard, and that the loop does not already has a sufficient
guard. The artifical SimplifiedLoopLatch check would be unnecessary,
and the design would be more general and canonical. Two difficulties:

- We need a strong guarantee that we won't endlessly rotate, so the
  analysis would need to be precise in order to avoid the
  SimplifiedLoopLatch precondition.

- Analysis like this are usually based on SCEV, which we don't want to
  rely on.

(2) Rotate on-demand in late loop passes. This could even be done by
shoving the loop back on the queue after the optimization that needs
it. This could work well when we find LICM opportunities in
multi-branch loops. This requires some work, and it doesn't really
solve the problem of SCEV wanting a loop guard before the analysis.

llvm-svn: 181230

9c72b071

Provide InstCombines for the following 3 cases: · 3e4fc3ef

Jean-Luc Duprat authored May 06, 2013

A * (1 - (uitofp i1 C)) -> select C, 0, A
B * (uitofp i1 C) -> select C, B, 0
select C, 0, A + select C, B, 0 -> select C, B, A

These come up in code that has been hand-optimized from a select to a linear blend, 
on platforms where that may have mattered. We want to undo such changes 
with the following transform:
A*(1 - uitofp i1 C) + B*(uitofp i1 C) -> select C, A, B

llvm-svn: 181216

3e4fc3ef

Update the comment to mention that we use TTI. · 632b25b7
Nadav Rotem authored May 06, 2013
```
llvm-svn: 181178
```
632b25b7
Revert r164763 because it introduces new shuffles. · c70ef4e9
Nadav Rotem authored May 06, 2013
```
Thanks Nick Lewycky for pointing this out.

llvm-svn: 181177
```
c70ef4e9

Fix const merging when an alias of a const is llvm.used. · c229a4ff

Rafael Espindola authored May 06, 2013

We used to disable constant merging not only if a constant is llvm.used, but
also if an alias of a constant is llvm.used. This change fixes that.

llvm-svn: 181175

c229a4ff

May 05, 2013
- LoopVectorize: Print values instead of pointers in debug output. · 3e3f2a4b
  Benjamin Kramer authored May 05, 2013
```
llvm-svn: 181157
```
  3e3f2a4b
- LoopVectorize: Add support for floating point min/max reductions · d96e427e
  Arnold Schwaighofer authored May 05, 2013
```
Add support for min/max reductions when "no-nans-float-math" is enabled. This
allows us to assume we have ordered floating point math and treat ordered and
unordered predicates equally.

radar://13723044

llvm-svn: 181144
```
  d96e427e
- LoopVectorizer: Cleanup of miminimum/maximum pattern match code · f5183729
  Arnold Schwaighofer authored May 05, 2013
```
No need for setting the operands. The pointers are going to be bound by the
matcher.

radar://13723044

llvm-svn: 181142
```
  f5183729
- LoopVectorize: We don't need an identity element for min/max reductions · a670a0a3
  Arnold Schwaighofer authored May 05, 2013
```
We can just use the initial element that feeds the reduction.

  max(max(x, y), z) == max(max(x,y), max(x,z))

radar://13723044

llvm-svn: 181141
```
  a670a0a3
- Add ArrayRef constructor from None, and do the cleanups that this constructor enables · 3238fb75
  Dmitri Gribenko authored May 05, 2013
```
Patch by Robert Wilhelm.

llvm-svn: 181138
```
  3238fb75
May 04, 2013
- Tabs to spaces. No functionality change. · 881e9d62
  Nick Lewycky authored May 04, 2013
```
llvm-svn: 181082
```
  881e9d62
May 03, 2013

Decompose GVN::processNonLocalLoad() (about 400 LOC) into smaller helper... · 637b9beb

Shuxin Yang authored May 03, 2013

Decompose GVN::processNonLocalLoad() (about 400 LOC) into smaller helper functions. No function change. 

This function consists of following steps:
   1. Collect dependent memory accesses.
   2. Analyze availability.
   3. Perform fully redundancy elimination, or 
   4. Perform PRE, depending on the availability

 Step 2, 3 and 4 are now moved to three helper routines.

llvm-svn: 181047

637b9beb

LoopVectorizer: Add support for if-conversion of PHINodes with 3+ incoming values. · 4ce060b3

Nadav Rotem authored May 03, 2013

By supporting the vectorization of PHINodes with more than two incoming values we can increase the complexity of nested if statements.

We can now vectorize this loop:

int foo(int *A, int *B, int n) {
  for (int i=0; i < n; i++) {
    int x = 9;
    if (A[i] > B[i]) {
      if (A[i] > 19) {
        x = 3;
      } else if (B[i] < 4 ) {
        x = 4;
      } else {
        x = 5;
      }
    }
    A[i] = x;
  }
}

llvm-svn: 181037

4ce060b3

May 02, 2013

[GV] Remove dead code which is really difficult to decipher. · af2c3ddf

Shuxin Yang authored May 02, 2013

Actually it took me couple of hours trying to make sense of them and
only to find they are dead code.  I guess the original author used
"allSingleSucc" to indicate if there are any critial edge emanating
from some blocks, and tried to perform code motion (actually speculation)
in the presence of these critical edges; but later on he/she changed mind
and decided to perform edge-splitting first.

llvm-svn: 180951

af2c3ddf

May 01, 2013

This patch breaks up Wrap.h so that it does not have to include all of · dec20e43

Filip Pizlo authored May 01, 2013

the things, and renames it to CBindingWrapping.h.  I also moved 
CBindingWrapping.h into Support/.

This new file just contains the macros for defining different wrap/unwrap 
methods.

The calls to those macros, as well as any custom wrap/unwrap definitions 
(like for array of Values for example), are put into corresponding C++ 
headers.

Doing this required some #include surgery, since some .cpp files relied 
on the fact that including Wrap.h implicitly caused the inclusion of a 
bunch of other things.

This also now means that the C++ headers will include their corresponding 
C API headers; for example Value.h must include llvm-c/Core.h.  I think 
this is harmless, since the C API headers contain just external function 
declarations and some C types, so I don't believe there should be any 
nasty dependency issues here.

llvm-svn: 180881

dec20e43

SROA: Generate selects instead of shuffles when blending values because this... · 1e211913

Nadav Rotem authored May 01, 2013

SROA: Generate selects instead of shuffles when blending values because this is the cannonical form.
Shuffles are more difficult to lower and we usually don't touch them, while we do optimize selects more often.

llvm-svn: 180875

1e211913

Revert "InstCombine: Fold more shuffles of shuffles." · d11584a7

Jim Grosbach authored May 01, 2013

This reverts commit r180802

There's ongoing discussion about whether this is the right place to make
this transformation. Reverting for now while we figure it out.

llvm-svn: 180834

d11584a7

Fix a use after free. RI is freed before the call to getDebugLoc(). To · 624c2ebc
Richard Trieu authored Apr 30, 2013
```
prevent this, capture the location before RI is freed.

llvm-svn: 180824
```
624c2ebc

Apr 30, 2013

Fix a typo · 9feda607
Nadav Rotem authored Apr 30, 2013
```
llvm-svn: 180806
```
9feda607

InstCombine: Fold more shuffles of shuffles. · 0b914fe8

Jim Grosbach authored Apr 30, 2013

Always fold a shuffle-of-shuffle into a single shuffle when there's only one
input vector in the first place. Continue to be more conservative when there's
multiple inputs.

rdar://13402653
PR15866

llvm-svn: 180802

0b914fe8

Spelling. Thanks, Eric. · 8beccf9e
Adrian Prantl authored Apr 30, 2013
```
llvm-svn: 180794
```
8beccf9e
Set debug locations for branch instructions created during inlining, even · 0941638a
Adrian Prantl authored Apr 30, 2013
```
the inlined function has multiple returns.

rdar://problem/12415623

llvm-svn: 180793
```
0941638a

Fix a bug in foldSelectICmpAndOr. · d73f37bb

David Majnemer authored Apr 30, 2013

Differences in bitwidth between X and Y could exist even if C1 and C2 have
the same Log2 representation.

llvm-svn: 180779

d73f37bb

Fix "Combine bit test + conditional or into simple math" · 8d048d04

David Majnemer authored Apr 30, 2013

This fixes the optimization introduced in r179748 and reverted in r179750.

While the optimization was sound, it did not properly respect differences in
bit-width.

llvm-svn: 180777

8d048d04

Apr 29, 2013

SimplifyCFG: If convert single conditional stores · 474df6d3

Arnold Schwaighofer authored Apr 29, 2013

This resurrects r179957, but adds code that makes sure we don't touch
atomic/volatile stores:

This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:

 a[i] =
 may-alias with a[i] load
 if (cond)
   a[i] = Y

into an unconditional store.

 a[i] = X
 may-alias with a[i] load
 tmp = cond ? Y : X;
 a[i] = tmp

We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case where the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.

hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.

llvm-svn: 180731

474df6d3

Add in some conditional compilation in order to silence an unused variable warning. · 03cf3c89
Michael Gottesman authored Apr 29, 2013
```
llvm-svn: 180700
```
03cf3c89