- Apr 26, 2014
-
-
Dan Liew authored
llvm-svn: 207328
-
Craig Topper authored
llvm-svn: 207327
-
Craig Topper authored
Remove an unused version of getMemIntrinsicNode and getNode. Additionally, these were calling makeVTList with the pointers passed in which would were unlikely to belong to SelectionDAG and likely would have just been stack pointers. llvm-svn: 207326
-
David Blaikie authored
llvm-svn: 207324
-
David Blaikie authored
Since there's no way to ensure the type unit in the .dwo and the type unit skeleton in the .o are correlated, this cannot work. This implementation is a bit inefficient for a few reasons, called out in comments. llvm-svn: 207323
-
Benjamin Kramer authored
llvm-svn: 207322
-
David Blaikie authored
Sinking addition of the declaration attribute down to where the signature is added. So that if the signature is not added neither is the declaration attribute (this will come in handy when aborting type unit construction to instead emit the type into the CU directly in some cases) Pull out type unit identifier hashing just to simplify the function a little, it'll be getting longer. llvm-svn: 207321
-
Benjamin Kramer authored
Turn vectorization back on. llvm-svn: 207320
-
Benjamin Kramer authored
llvm-svn: 207318
-
Benjamin Kramer authored
This gets us pretty code for divs of i16 vectors. Turn the existing intrinsics into the corresponding nodes. llvm-svn: 207317
-
Benjamin Kramer authored
Rip out X86-specific vector SDIV lowering, make the corresponding DAGCombiner transform work on vectors. llvm-svn: 207316
-
Benjamin Kramer authored
Otherwise the legalizer would just scalarize everything. Support for mulhi in the targets isn't that great yet so on most targets we get exactly the same scalarized output. Add a test for x86 vector udiv. I had to disable the mulhi nodes on ARM because there aren't any patterns for it. As far as I know ARM has instructions for getting the high part of a multiply so this should be fixed. llvm-svn: 207315
-
Benjamin Kramer authored
Test will follow soon. llvm-svn: 207314
-
Michael Zolotukhin authored
llvm-svn: 207313
-
Chandler Carruth authored
them, just skip over any DFS-numbered nodes when finding the next root of a DFS. This allows the entry set to just be a vector as we populate it from a uniqued source. It also removes the possibility for a linear scan of the entry set to actually do the removal which can make things go quadratic if we get unlucky. llvm-svn: 207312
-
Chandler Carruth authored
the DFS stack for leaves in the call graph. As mentioned in my previous commit, this is particularly interesting for graphs which have high fan out but low connectivity resulting in many leaves. For such graphs, this can remove a large % of the DFS stack traffic even though it doesn't make the stack much smaller. It's a bit easier to formulate this for the full algorithm because that one stops completely for each SCC. For example, I was able to directly eliminate the "Recurse" boolean used to continue an outer loop from the inner loop. llvm-svn: 207311
-
Chandler Carruth authored
makes working through the worklist much cleaner, and makes it possible to avoid the 'bool-to-continue-the-outer-loop' hack. Not a huge difference, but I think this is approaching as polished as I can make it. llvm-svn: 207310
-
Gerolf Hoflehner authored
more than 1 instruction. The caller need to be aware of this and adjust instruction iterators accordingly. rdar://16679376 Repaired r207302. llvm-svn: 207309
-
Gerolf Hoflehner authored
overwritten by previous backout of r207303 llvm-svn: 207308
-
Chandler Carruth authored
processed in the DFS out of the stack completely. Keep it exclusively in a variable. Re-shuffle some code structure to make this easier. This can have a very dramatic effect in some cases because call graphs tend to look like a high fan-out spanning tree. As a consequence, there are a large number of leaf nodes in the graph, and this technique causes leaf nodes to never even go into the stack. While this only reduces the max depth by 1, it may cause the total number of round trips through the stack to drop by a lot. Now, most of this isn't really relevant for the incremental version. =] But I wanted to prototype it first here as this variant is in ways more complex. As long as I can get the code factored well here, I'll next make the primary walk look the same. There are several refactorings this exposes I think. llvm-svn: 207306
-
Chandler Carruth authored
graph in any way because we don't track edges in the SCC graph, just nodes. This also lets us add a nice assert about the invariant that we're working on at least a certain number of nodes within the SCC. llvm-svn: 207305
-
Juergen Ributzka authored
The included test case would return the incorrect results, because the expansion of an shift with a constant shift amount of 0 would generate undefined behavior. This is because ExpandShiftByConstant assumes that all shifts by constants with a value of 0 have already been optimized away. This doesn't happen for opaque constants and usually this isn't a problem, because opaque constants won't take this code path - they are not supposed to. In the case that the opaque constant has to be expanded by the legalizer, the legalizer would drop the opaque flag. In this case we hit the limitations of ExpandShiftByConstant and create incorrect code. This commit fixes the legalizer by not dropping the opaque flag when expanding opaque constants and adding an assertion to ExpandShiftByConstant to catch this not supported case in the future. This fixes <rdar://problem/16718472> llvm-svn: 207304
-
Gerolf Hoflehner authored
have been reported. llvm-svn: 207303
-
Gerolf Hoflehner authored
more than 1 instruction. The caller need to be aware of this and adjust instruction iterators accordingly. rdar://16679376 llvm-svn: 207302
-
Quentin Colombet authored
Scaling factors are not free on X86 because every "complex" addressing mode breaks the related instruction into 2 allocations instead of 1. <rdar://problem/16730541> llvm-svn: 207301
-
Chandler Carruth authored
a helper function. Also factor the other two places where we did the same thing into the helper function. =] Much cleaner this way. NFC. llvm-svn: 207300
-
Andrea Di Biagio authored
right intrinsics. A packed logical shift right with a shift count bigger than or equal to the element size always produces a zero vector. In all other cases, it can be safely replaced by a 'lshr' instruction. llvm-svn: 207299
-
Richard Smith authored
llvm-svn: 207298
-
Filipe Cabecinhas authored
llvm-svn: 207295
-
Filipe Cabecinhas authored
Summary: If we're doing a v4f32/v4i32 shuffle on x86 with SSE4.1, we can lower certain shufflevectors to an insertps instruction: When most of the shufflevector result's elements come from one vector (and keep their index), and one element comes from another vector or a memory operand. Added tests for insertps optimizations on shufflevector. Added support and tests for v4i32 vector optimization. Reviewers: nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3475 llvm-svn: 207291
-
Duncan P. N. Exon Smith authored
This reverts commit r207286. It causes an ICE on the cmake-llvm-x86_64-linux buildbot [1]: llvm/lib/Analysis/BlockFrequencyInfo.cpp: In lambda function: llvm/lib/Analysis/BlockFrequencyInfo.cpp:182:1: internal compiler error: in get_expr_operands, at tree-ssa-operands.c:1035 [1]: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/12093/steps/build_llvm/logs/stdio llvm-svn: 207287
-
Duncan P. N. Exon Smith authored
Previously, irreducible backedges were ignored. With this commit, irreducible SCCs are discovered on the fly, and modelled as loops with multiple headers. This approximation specifies the headers of irreducible sub-SCCs as its entry blocks and all nodes that are targets of a backedge within it (excluding backedges within true sub-loops). Block frequency calculations act as if we insert a new block that intercepts all the edges to the headers. All backedges and entries to the irreducible SCC point to this imaginary block. This imaginary block has an edge (with even probability) to each header block. The result is now reasonable enough that I've added a number of testcases for irreducible control flow. I've outlined in `BlockFrequencyInfoImpl.h` ways to improve the approximation. <rdar://problem/14292693> llvm-svn: 207286
-
Adrian Prantl authored
llvm-svn: 207284
-
Eric Christopher authored
low_pc similar to location lists. Fixes PR19563 llvm-svn: 207283
-
Matt Arsenault authored
v2: Check both ExternalSymbol and GlobalAddress Patch by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 207282
-
David Blaikie authored
This also avoids the need for subtly side-effecting calls to manifest strings in the string table at the point where items are added to the accelerator tables. llvm-svn: 207281
-
- Apr 25, 2014
-
-
Tom Roeder authored
This adds support for an -mattr option to the gold plugin and to llvm-lto. This allows the caller to specify details of the subtarget architecture, like +aes, or +ssse3 on x86. Note that this requires a change to the include/llvm-c/lto.h interface: it adds a function lto_codegen_set_attr and it increments the version of the interface. llvm-svn: 207279
-
Alexey Samsonov authored
llvm-svn: 207278
-
David Blaikie authored
Pulls out some more code from some of the rather monolithic DWARF classes. Unlike the address table, the string table won't move up into DwarfDebug - each DWARF file has its own string table (but there can be only one address table). llvm-svn: 207277
-
Alexey Samsonov authored
No functionality change. llvm-svn: 207276
-