- Jul 16, 2013
-
-
Craig Topper authored
llvm-svn: 186371
-
Manman Ren authored
We can have a FrameSetup in one basic block and the matching FrameDestroy in a different basic block when we have struct byval. In that case, SPAdj is not zero at beginning of the basic block. Modify PEI to correctly set SPAdj at beginning of each basic block using DFS traversal. We used to assume SPAdj is 0 at beginning of each basic block. PEI had an assert SPAdjCount || SPAdj == 0. If we have a Destroy <n> followed by a Setup <m>, PEI will assert failure. We can add an extra condition to make sure the pairs are matched: The pairs start with a FrameSetup. But since we are doing a much better job in the verifier, this patch removes the check in PEI. PR16393 llvm-svn: 186364
-
Nadav Rotem authored
Compares return i1 but they compare different types. llvm-svn: 186359
-
Hal Finkel authored
This change mirrors the changes that were made to the X86 and ARM targets to support subtarget feature changing. As indicated in r182899, the mechanism is still undergoing revision, and so as with the X86 and ARM targets, there is no test case yet (there is no effective functionality change). llvm-svn: 186357
-
- Jul 15, 2013
-
-
Manman Ren authored
1> on every path through the CFG, a FrameSetup <n> is always followed by a FrameDestroy <n> and a FrameDestroy is always followed by a FrameSetup. 2> stack adjustments are identical on all CFG edges to a merge point. 3> frame is destroyed at end of a return block. PR16393 llvm-svn: 186350
-
Rafael Espindola authored
I checked that opening a directory on windows does fail, so this saves a "stat". llvm-svn: 186345
-
Hal Finkel authored
PPCInstrInfo::insertSelect and PPCInstrInfo::canInsertSelect were computing the common subclass of the true and false inputs, and then selecting either the 32-bit or the 64-bit isel variant based on the result of calling PPC::GPRCRegClass.hasSubClassEq(RC) and PPC::G8RCRegClass.hasSubClassEq(RC) (where RC is the common subclass). Unfortunately, this is not quite right: if we have something like this: %vreg8<def> = SELECT_CC_I8 %vreg4<kill>, %vreg7<kill>, %vreg6<kill>, 76; G8RC_and_G8RC_NOX0:%vreg8 CRRC:%vreg4 G8RC_NOX0:%vreg7,%vreg6 then the common subclass of G8RC_and_G8RC_NOX0 and G8RC_NOX0 is G8RC_NOX0, and G8RC_NOX0 is not a subclass of G8RC (because it also contains the ZERO8 pseudo-register). As a result, we also need to check the common subclass against GPRC_NOR0 and G8RC_NOX0 explicitly. This had not been a problem for clients of insertSelect that called canInsertSelect first (because it had a compensating mistake), but insertSelect is also used by the PPC pseudo-instruction expander, and this error was causing a problem in that context. This problem was found by csmith. llvm-svn: 186343
-
Reid Kleckner authored
This is consistent with the ELF object writer. Add some COFF tests that relocate against an alias. Reviewers: espindola Differential Revision: http://llvm-reviews.chandlerc.com/D1079 llvm-svn: 186341
-
Tom Stellard authored
https://bugs.freedesktop.org/show_bug.cgi?id=65873 llvm-svn: 186339
-
Hal Finkel authored
There is a comment at the top of DAGTypeLegalizer::PerformExpensiveChecks which, in part, says: // Note that these invariants may not hold momentarily when processing a node: // the node being processed may be put in a map before being marked Processed. Unfortunately, this assert would be valid only if the above-mentioned invariant held unconditionally. This was causing llc to assert when, in fact, everything was fine. Thanks to Richard Sandiford for investigating this issue! Fixes PR16562. llvm-svn: 186338
-
Stephen Lin authored
llvm-svn: 186333
-
Chandler Carruth authored
a bot. This reverts the commit which introduced a new implementation of the fancy SROA pass designed to reduce its overhead. I'll skip the huge commit log here, refer to r186316 if you're looking for how this all works and why it works that way. llvm-svn: 186332
-
Reid Kleckner authored
This broke clang's crash-report.c test, and I haven't been able to figure it out yet. This reverts commit r186319. llvm-svn: 186329
-
Job Noorman authored
llvm-svn: 186321
-
Reid Kleckner authored
No functionality change. This is preparing to move response file parsing into lib/Option so it can be shared between clang and lld. This change isn't just a micro-optimization. Clang's driver uses a std::set<std::string> to unique arguments while parsing response files, so this matches that. llvm-svn: 186319
-
Chandler Carruth authored
different core implementation strategy. Previously, SROA would build a relatively elaborate partitioning of an alloca, associate uses with each partition, and then rewrite the uses of each partition in an attempt to break apart the alloca into chunks that could be promoted. This was very wasteful in terms of memory and compile time because regardless of how complex the alloca or how much we're able to do in breaking it up, all of the datastructure work to analyze the partitioning was done up front. The new implementation attempts to form partitions of the alloca lazily and on the fly, rewriting the uses that make up that partition as it goes. This has a few significant effects: 1) Much simpler data structures are used throughout. 2) No more double walk of the recursive use graph of the alloca, only walk it once. 3) No more complex algorithms for associating a particular use with a particular partition. 4) PHI and Select speculation is simplified and happens lazily. 5) More precise information is available about a specific use of the alloca, removing the need for some side datastructures. Ultimately, I think this is a much better implementation. It removes about 300 lines of code, but arguably removes more like 500 considering that some code grew in the process of being factored apart and cleaned up for this all to work. I've re-used as much of the old implementation as possible, which includes the lion's share of code in the form of the rewriting logic. The interesting new logic centers around how the uses of a partition are sorted, and split into actual partitions. Each instruction using a pointer derived from the alloca gets a 'Partition' entry. This name is totally wrong, but I'll do a rename in a follow-up commit as there is already enough churn here. The entry describes the offset range accessed and the nature of the access. Once we have all of these entries we sort them in a very specific way: increasing order of begin offset, followed by whether they are splittable uses (memcpy, etc), followed by the end offset or whatever. Sorting by splittability is important as it simplifies the collection of uses into a partition. Once we have these uses sorted, we walk from the beginning to the end building up a range of uses that form a partition of the alloca. Overlapping unsplittable uses are merged into a single partition while splittable uses are broken apart and carried from one partition to the next. A partition is also introduced to bridge splittable uses between the unsplittable regions when necessary. I've looked at the performance PRs fairly closely. PR15471 no longer will even load (the module is invalid). Not sure what is up there. PR15412 improves by between 5% and 10%, however it is nearly impossible to know what is holding it up as SROA (the entire pass) takes less time than reading the IR for that test case. The analysis takes the same time as running mem2reg on the final allocas. I suspect (without much evidence) that the new implementation will scale much better however, and it is just the small nature of the test cases that makes the changes small and noisy. Either way, it is still simpler and cleaner I think. llvm-svn: 186316
-
Alexey Samsonov authored
DebugInfo: Factor out parsing compile unit DIEs to a separate function. Improve code style and comments. No functionality change. llvm-svn: 186315
-
Craig Topper authored
llvm-svn: 186312
-
Craig Topper authored
llvm-svn: 186311
-
Craig Topper authored
llvm-svn: 186310
-
Craig Topper authored
llvm-svn: 186309
-
Craig Topper authored
llvm-svn: 186308
-
Craig Topper authored
llvm-svn: 186307
-
Craig Topper authored
llvm-svn: 186303
-
Craig Topper authored
llvm-svn: 186301
-
Eric Christopher authored
llvm-svn: 186297
-
Eric Christopher authored
llvm-svn: 186296
-
- Jul 14, 2013
-
-
Eric Christopher authored
llvm-svn: 186295
-
Anton Korobeynikov authored
Patch by Job! llvm-svn: 186291
-
Anton Korobeynikov authored
llvm-svn: 186283
-
Nadav Rotem authored
SLPVectorizer: change the order in which we search for vectorization candidates. Do stores first and PHIs second. llvm-svn: 186277
-
Tobias Grosser authored
llvm-svn: 186276
-
Craig Topper authored
llvm-svn: 186274
-
Andrew Trick authored
The great thing about the SCEVAddRec No-Wrap flag (unlike nsw/nuw) is that is can be preserved while normalizing (reassociating and factoring). The bad thing is that is can't be tranfered back to IR, which is one of the reasons I don't like the concept of SCEVExpander. Sorry, I can't think of a direct way to test this, which is why these were FIXMEs for so long. I just think it's a good time to finally clean it up. llvm-svn: 186273
-
Andrew Trick authored
Fixes PR16600. llvm-svn: 186272
-
- Jul 13, 2013
-
-
Arnold Schwaighofer authored
If an outside loop user of the reduction value uses the header phi node we cannot just reduce the vectorized phi value in the vector code epilog because we would loose VF-1 reductions. lp: p = phi (0, lv) lv = lv + 1 ... brcond , lp, outside outside: usr = add 0, p (Say the loop iterates two times, the value of p coming out of the loop is one). We cannot just transform this to: vlp: p = phi (<0,0>, lv) lv = lv + <1,1> .. brcond , lp, outside outside: p_reduced = p[0] + [1]; usr = add 0, p_reduced (Because the original loop iterated two times the vectorized loop would iterate one time, but p_reduced ends up being zero instead of one). We would have to execute VF-1 iterations in the scalar remainder loop in such cases. For now, just disable vectorization. PR16522 llvm-svn: 186256
-
Joerg Sonnenberger authored
between ELF (Linux, FreeBSD, NetBSD) and OSX as platform for the assembler dialect. llvm-svn: 186252
-
Craig Topper authored
llvm-svn: 186243
-
Andrew Trick authored
In general, one should always complete CFG modifications first, update CFG-based analyses, like Dominatores and LoopInfo, then generate instruction sequences. LoopVectorizer was creating a new loop, calling SCEVExpander to generate checks, then updating LoopInfo. I just changed the order. llvm-svn: 186241
-
Nick Lewycky authored
llvm-svn: 186235
-