- May 11, 2013
-
-
Nadav Rotem authored
For example: bar() { int a = A[i]; int b = A[i+1]; B[i] = a; B[i+1] = b; foo(a); <--- a is used outside the vectorized expression. } llvm-svn: 181648
-
Nadav Rotem authored
llvm-svn: 181647
-
- May 10, 2013
-
-
Benjamin Kramer authored
The shift amount may be larger than the type leading to undefined behavior. Limit the transform to constant shift amounts. While there update the bits to clear in the result which may enable additional optimizations. PR15959. llvm-svn: 181604
-
Benjamin Kramer authored
PR15952. llvm-svn: 181586
-
- May 09, 2013
-
-
Dmitri Gribenko authored
llvm-svn: 181551
-
Shuxin Yang authored
iteration. This on step toward non-iterative GVN. My local hack suggests that getting rid of iteration will speedup GVN by 30%+ on a medium sized input (2k LOC, C++). I cannot explain why not 2x or more at this moment. llvm-svn: 181532
-
Rafael Espindola authored
When we replace an internal alias with its target, be careful not to replace the entry in llvm.used (and llvm.compiler_used). llvm-svn: 181524
-
Benjamin Kramer authored
That's obviously wrong. Conservatively restrict it to the sign bit, which matches the original intention of this analysis. Fixes PR15940. llvm-svn: 181518
-
Arnold Schwaighofer authored
A computable loop exit count does not imply the presence of an induction variable. Scalar evolution can return a value for an infinite loop. Fixes PR15926. llvm-svn: 181495
-
- May 08, 2013
-
-
Daniel Malea authored
- requires existing debug information to be present - fixes up file name and line number information in metadata - emits a "<orig_filename>-debug.ll" succinct IR file (without !dbg metadata or debug intrinsics) that can be read by a debugger - initialize pass in opt tool to enable the "-debug-ir" flag - lit tests to follow llvm-svn: 181467
-
Nick Lewycky authored
by switching to a ValueMap. Patch by Andrea DiBiagio! llvm-svn: 181397
-
- May 07, 2013
-
-
Arnold Schwaighofer authored
The two nested loops were confusing and also conservative in identifying reduction variables. This patch replaces them by a worklist based approach. llvm-svn: 181369
-
Arnold Schwaighofer authored
We were passing an i32 to ConstantInt::get where an i64 was needed and we must also pass the sign if we pass negatives numbers. The start index passed to getConsecutiveVector must also be signed. Should fix PR15882. llvm-svn: 181286
-
- May 06, 2013
-
-
David Majnemer authored
llvm-svn: 181249
-
Andrew Trick authored
Test case by Michele Scandale! Fixes PR10293: Load not hoisted out of loop with multiple exits. There are few regressions with this patch, now tracked by rdar:13817079, and a roughly equal number of improvements. The regressions are almost certainly back luck because LoopRotate has very little idea of whether rotation is profitable. Doing better requires a more comprehensive solution. This checkin is a quick fix that lacks generality (PR10293 has a counter-example). But it trivially fixes the case in PR10293 without interfering with other cases, and it does satify the criteria that LoopRotate is a loop canonicalization pass that should avoid heuristics and special cases. I can think of two approaches that would probably be better in the long run. Ultimately they may both make sense. (1) LoopRotate should check that the current header would make a good loop guard, and that the loop does not already has a sufficient guard. The artifical SimplifiedLoopLatch check would be unnecessary, and the design would be more general and canonical. Two difficulties: - We need a strong guarantee that we won't endlessly rotate, so the analysis would need to be precise in order to avoid the SimplifiedLoopLatch precondition. - Analysis like this are usually based on SCEV, which we don't want to rely on. (2) Rotate on-demand in late loop passes. This could even be done by shoving the loop back on the queue after the optimization that needs it. This could work well when we find LICM opportunities in multi-branch loops. This requires some work, and it doesn't really solve the problem of SCEV wanting a loop guard before the analysis. llvm-svn: 181230
-
Jean-Luc Duprat authored
A * (1 - (uitofp i1 C)) -> select C, 0, A B * (uitofp i1 C) -> select C, B, 0 select C, 0, A + select C, B, 0 -> select C, B, A These come up in code that has been hand-optimized from a select to a linear blend, on platforms where that may have mattered. We want to undo such changes with the following transform: A*(1 - uitofp i1 C) + B*(uitofp i1 C) -> select C, A, B llvm-svn: 181216
-
Nadav Rotem authored
llvm-svn: 181178
-
Nadav Rotem authored
Thanks Nick Lewycky for pointing this out. llvm-svn: 181177
-
Rafael Espindola authored
We used to disable constant merging not only if a constant is llvm.used, but also if an alias of a constant is llvm.used. This change fixes that. llvm-svn: 181175
-
- May 05, 2013
-
-
Benjamin Kramer authored
llvm-svn: 181157
-
Arnold Schwaighofer authored
Add support for min/max reductions when "no-nans-float-math" is enabled. This allows us to assume we have ordered floating point math and treat ordered and unordered predicates equally. radar://13723044 llvm-svn: 181144
-
Arnold Schwaighofer authored
No need for setting the operands. The pointers are going to be bound by the matcher. radar://13723044 llvm-svn: 181142
-
Arnold Schwaighofer authored
We can just use the initial element that feeds the reduction. max(max(x, y), z) == max(max(x,y), max(x,z)) radar://13723044 llvm-svn: 181141
-
Dmitri Gribenko authored
Patch by Robert Wilhelm. llvm-svn: 181138
-
- May 04, 2013
-
-
Nick Lewycky authored
llvm-svn: 181082
-
- May 03, 2013
-
-
Shuxin Yang authored
Decompose GVN::processNonLocalLoad() (about 400 LOC) into smaller helper functions. No function change. This function consists of following steps: 1. Collect dependent memory accesses. 2. Analyze availability. 3. Perform fully redundancy elimination, or 4. Perform PRE, depending on the availability Step 2, 3 and 4 are now moved to three helper routines. llvm-svn: 181047
-
Nadav Rotem authored
By supporting the vectorization of PHINodes with more than two incoming values we can increase the complexity of nested if statements. We can now vectorize this loop: int foo(int *A, int *B, int n) { for (int i=0; i < n; i++) { int x = 9; if (A[i] > B[i]) { if (A[i] > 19) { x = 3; } else if (B[i] < 4 ) { x = 4; } else { x = 5; } } A[i] = x; } } llvm-svn: 181037
-
- May 02, 2013
-
-
Shuxin Yang authored
Actually it took me couple of hours trying to make sense of them and only to find they are dead code. I guess the original author used "allSingleSucc" to indicate if there are any critial edge emanating from some blocks, and tried to perform code motion (actually speculation) in the presence of these critical edges; but later on he/she changed mind and decided to perform edge-splitting first. llvm-svn: 180951
-
- May 01, 2013
-
-
Filip Pizlo authored
the things, and renames it to CBindingWrapping.h. I also moved CBindingWrapping.h into Support/. This new file just contains the macros for defining different wrap/unwrap methods. The calls to those macros, as well as any custom wrap/unwrap definitions (like for array of Values for example), are put into corresponding C++ headers. Doing this required some #include surgery, since some .cpp files relied on the fact that including Wrap.h implicitly caused the inclusion of a bunch of other things. This also now means that the C++ headers will include their corresponding C API headers; for example Value.h must include llvm-c/Core.h. I think this is harmless, since the C API headers contain just external function declarations and some C types, so I don't believe there should be any nasty dependency issues here. llvm-svn: 180881
-
Nadav Rotem authored
SROA: Generate selects instead of shuffles when blending values because this is the cannonical form. Shuffles are more difficult to lower and we usually don't touch them, while we do optimize selects more often. llvm-svn: 180875
-
Jim Grosbach authored
This reverts commit r180802 There's ongoing discussion about whether this is the right place to make this transformation. Reverting for now while we figure it out. llvm-svn: 180834
-
Richard Trieu authored
prevent this, capture the location before RI is freed. llvm-svn: 180824
-
- Apr 30, 2013
-
-
Nadav Rotem authored
llvm-svn: 180806
-
Jim Grosbach authored
Always fold a shuffle-of-shuffle into a single shuffle when there's only one input vector in the first place. Continue to be more conservative when there's multiple inputs. rdar://13402653 PR15866 llvm-svn: 180802
-
Adrian Prantl authored
llvm-svn: 180794
-
Adrian Prantl authored
the inlined function has multiple returns. rdar://problem/12415623 llvm-svn: 180793
-
David Majnemer authored
Differences in bitwidth between X and Y could exist even if C1 and C2 have the same Log2 representation. llvm-svn: 180779
-
David Majnemer authored
This fixes the optimization introduced in r179748 and reverted in r179750. While the optimization was sound, it did not properly respect differences in bit-width. llvm-svn: 180777
-
- Apr 29, 2013
-
-
Arnold Schwaighofer authored
This resurrects r179957, but adds code that makes sure we don't touch atomic/volatile stores: This transformation will transform a conditional store with a preceeding uncondtional store to the same location: a[i] = may-alias with a[i] load if (cond) a[i] = Y into an unconditional store. a[i] = X may-alias with a[i] load tmp = cond ? Y : X; a[i] = tmp We assume that on average the cost of a mispredicted branch is going to be higher than the cost of a second store to the same location, and that the secondary benefits of creating a bigger basic block for other optimizations to work on outway the potential case where the branch would be correctly predicted and the cost of the executing the second store would be noticably reflected in performance. hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With this change we are on par with gcc's performance (gcc also performs this transformation). There was a 1.2 % performance improvement on a ARM swift chip. Other tests in the test-suite+external seem to be mostly uninfluenced in my experiments: This optimization was triggered on 41 tests such that the executable was different before/after the patch. Only 1 out of the 40 tests (dealII) was reproducable below 100% (by about .4%). Given that hmmer benefits so much I believe this to be a fair trade off. llvm-svn: 180731
-
Michael Gottesman authored
llvm-svn: 180700
-