- Nov 27, 2011
-
-
Chandler Carruth authored
fallthrough) in cases where we might fail to rotate an exit to an outer loop onto the end of the loop chain. Having *some* rotation, but not performing this rotation, is the primary fix of thep performance regression with -enable-block-placement for Olden/em3d (a whopping 30% regression). Still working on reducing the test case that actually exercises this and the new rotation strategy out of this code, but I want to check if this regresses other test cases first as that may indicate it isn't the correct fix. llvm-svn: 145195
-
Rafael Espindola authored
* Enabling sse enables mmx. * Disabling (-mno-mmx) mmx, doesn't disable sse (we got this right already). * The order in not important. -msse -mno-mmx is the same as -mno-mmx -msse. llvm-svn: 145194
-
Chris Lattner authored
rewrite the known problems section. Including a short list of individual bugs per target isn't particularly useful. Link to the target features matrix. llvm-svn: 145193
-
Chris Lattner authored
blog'izing it. llvm-svn: 145192
-
Chris Lattner authored
llvm-svn: 145191
-
Chris Lattner authored
llvm-svn: 145190
-
Rafael Espindola authored
llvm-svn: 145189
-
Chris Lattner authored
fails on ppc and arm hosts. llvm-svn: 145188
-
Sebastian Redl authored
This supports single-element initializer lists for references according to DR1288, as well as creating temporaries and binding to them for other initializer lists. llvm-svn: 145186
-
Rafael Espindola authored
llvm-svn: 145185
-
Rafael Espindola authored
llvm-svn: 145184
-
Chandler Carruth authored
was centered around the premise of laying out a loop in a chain, and then rotating that chain. This is good for preserving contiguous layout, but bad for actually making sane rotations. In order to keep it safe, I had to essentially make it impossible to rotate deeply nested loops. The information needed to correctly reason about a deeply nested loop is actually available -- *before* we layout the loop. We know the inner loops are already fused into chains, etc. We lose information the moment we actually lay out the loop. The solution was the other alternative for this algorithm I discussed with Benjamin and some others: rather than rotating the loop after-the-fact, try to pick a profitable starting block for the loop's layout, and then use our existing layout logic. I was worried about the complexity of this "pick" step, but it turns out such complexity is needed to handle all the important cases I keep teasing out of benchmarks. This is, I'm afraid, a bit of a work-in-progress. It is still misbehaving on some likely important cases I'm investigating in Olden. It also isn't really tested. I'm going to try to craft some interesting nested-loop test cases, but it's likely to be extremely time consuming and I don't want to go there until I'm sure I'm testing the correct behavior. Sadly I can't come up with a way of getting simple, fine grained test cases for this logic. We need complex loop structures to even trigger much of it. llvm-svn: 145183
-
Chandler Carruth authored
Original commit message: Fixed ObjectFile functions: - getSymbolOffset() renamed as getSymbolFileOffset() - getSymbolFileOffset(), getSymbolAddress(), getRelocationAddress() returns same result for ELFObjectFile, MachOObjectFile and COFFObjectFile. - added getRelocationOffset() - fixed MachOObjectFile::getSymbolSize() - fixed MachOObjectFile::getSymbolSection() - fixed MachOObjectFile::getSymbolOffset() for symbols without section data. llvm-svn: 145182
-
Chandler Carruth authored
llvm-svn: 145181
-
Danil Malyshev authored
- getSymbolOffset() renamed as getSymbolFileOffset() - getSymbolFileOffset(), getSymbolAddress(), getRelocationAddress() returns same result for ELFObjectFile, MachOObjectFile and COFFObjectFile. - added getRelocationOffset() - fixed MachOObjectFile::getSymbolSize() - fixed MachOObjectFile::getSymbolSection() - fixed MachOObjectFile::getSymbolOffset() for symbols without section data. llvm-svn: 145180
-
Chandler Carruth authored
heavily on AnalyzeBranch. That routine doesn't behave as we want given that rotation occurs mid-way through re-ordering the function. Instead merely check that there are not unanalyzable branching constructs present, and then reason about the CFG via successor lists. This actually simplifies my mental model for all of this as well. The concrete result is that we now will rotate more loop chains. I've added a test case from Olden highlighting the effect. There is still a bit more to do here though in order to regain all of the performance in Olden. llvm-svn: 145179
-
Chris Lattner authored
that mainline needs no autoupgrade logic for intrinsics yet, woohoo! llvm-svn: 145178
-
Chris Lattner authored
I'll work on turning this into something intelligible tomorrow. llvm-svn: 145177
-
Chris Lattner authored
autoupgrade logic for 2.9 and before. llvm-svn: 145176
-
Chris Lattner authored
trampoline forms. Both of these were correct in LLVM 3.0, and we don't need to support LLVM 2.9 and earlier in mainline. llvm-svn: 145174
-
Chris Lattner authored
llvm-svn: 145173
-
Chris Lattner authored
remove asmparsing and documentation support for "volatile load", which was only produced by LLVM 2.9 and earlier. LLVM 3.0 and later prefers "load volatile". llvm-svn: 145172
-
Chris Lattner authored
Upgrade syntax of tests using volatile instructions to use 'load volatile' instead of 'volatile load', which is archaic. llvm-svn: 145171
-
Chris Lattner authored
llvm-svn: 145170
-
Chris Lattner authored
I think this is the last of autoupgrade that can be removed in 3.1. Can the atomic upgrade stuff also go? llvm-svn: 145169
-
-
Chris Lattner authored
llvm-svn: 145167
-
Chris Lattner authored
LLVM 3.0 and later. llvm-svn: 145165
-
Chris Lattner authored
llvm-svn: 145164
-
Chris Lattner authored
llvm-svn: 145163
-
Bob Wilson authored
Besides cleaning up the repetition in the installhdrs target, the point of this change is to provide a separate do-installhdrs target that can be used directly from clang's runtime/libcxx makefile to install a copy of the headers along with clang. <rdar://problem/10397739> llvm-svn: 145162
-
Wesley Peck authored
These instructions are not generated by the backend yet, this will come in a later commit. llvm-svn: 145161
-
Bob Wilson authored
Removing that buildbot would be a better solution, but this is at least a temporary workaround. llvm-svn: 145160
-
Wesley Peck authored
Fix a couple of 80-column violations. llvm-svn: 145159
-
Chandler Carruth authored
pass. This is designed to achieve one of the important optimizations that the old code placement pass did, but more simply. This is a somewhat rough and *very* conservative version of the transform. We could get a lot fancier here if there are profitable cases to do so. In particular, this only looks for a single pattern, it insists that the loop backedge being rotated away is the last backedge in the chain, and it doesn't provide any means of doing better in-loop placement due to the rotation. However, it appears that it will handle the important loops I am finding in the LLVM test suite. llvm-svn: 145158
-
-
Benjamin Kramer authored
llvm-svn: 145154
-
- Nov 26, 2011
-
-
Craig Topper authored
Merge 128-bit and 256-bit X86ISD node types for VPERMILPS and VPERMILPD. Simplify some shuffle lowering code since V1 can never be UNDEF due to canonalizing that occurs when shuffle nodes are created. llvm-svn: 145153
-
Wesley Peck authored
llvm-svn: 145152
-
Rafael Espindola authored
and on clang, which seams to handled "=b" correctly even when ebx is the PIC register. llvm-svn: 145149
-