- May 05, 2012
-
-
Daniel Dunbar authored
llvm-svn: 156236
-
Benjamin Kramer authored
We might just use symlinks here, but I'm afraid of possible portability issues. llvm-svn: 156235
-
Benjamin Kramer authored
This came up when a change in block placement formed a cmov and slowed down a hot loop by 50%: ucomisd (%rdi), %xmm0 cmovbel %edx, %esi cmov is a really bad choice in this context because it doesn't get branch prediction. If we emit it as a branch, an out-of-order CPU can do a better job (if the branch is predicted right) and avoid waiting for the slow load+compare instruction to finish. Of course it won't help if the branch is unpredictable, but those are really rare in practice. This patch uses a dumb conservative heuristic, it turns all cmovs that have one use and a direct memory operand into branches. cmovs usually save some code size, so we disable the transform in -Os mode. In-Order architectures are unlikely to benefit as well, those are included in the "predictableSelectIsExpensive" flag. It would be better to reuse branch probability info here, but BPI doesn't support select instructions currently. It would make sense to use the same heuristics as the if-converter pass, which does the opposite direction of this transform. Test suite shows a small improvement here and there on corei7-level machines, but the actual results depend a lot on the used microarchitecture. The transformation is currently disabled by default and available by passing the -enable-cgp-select2branch flag to the code generator. Thanks to Chandler for the initial test case to him and Evan Cheng for providing me with comments and test-suite numbers that were more stable than mine :) llvm-svn: 156234
-
Benjamin Kramer authored
This will be used to determine whether it's profitable to turn a select into a branch when the branch is likely to be predicted. Currently enabled for everything but Atom on X86 and Cortex-A9 devices on ARM. I'm not entirely happy with the name of this flag, suggestions welcome ;) llvm-svn: 156233
-
Benjamin Kramer authored
llvm-svn: 156232
-
Stepan Dyatkovskiy authored
Small fix in InstCombineCasts.cpp. Restored "alloca + bitcast" reducing for case when alloca's size is calculated within the "add/sub/... nsw". Also added fix to 2011-06-13-nsw-alloca.ll test. llvm-svn: 156231
-
Eric Christopher authored
llvm-svn: 156226
-
Jakob Stoklund Olesen authored
This is still a topological ordering such that every register class gets a smaller enum value than its sub-classes. Placing the smaller spill sizes first makes a difference for the super-register class bit masks. When looking for a super-register class, we usually want the smallest possible kind of super-register. That is now available as the first bit set in the bit mask. llvm-svn: 156222
-
Jakob Stoklund Olesen authored
We want the representative register class to contain the largest super-registers available. This makes the function less sensitive to the register class numbering. llvm-svn: 156220
-
Jakob Stoklund Olesen authored
llvm-svn: 156219
-
David Blaikie authored
This fixes a couple of Clang warnings in release builds of LLVM: * Missing return in ISelLowering * Unused variable in NVPTXutil.cpp llvm-svn: 156216
-
Kevin Enderby authored
SignExtend32<22>(Val<<1) also needs to change to SignExtend32<21>(Val) . llvm-svn: 156213
-
Kevin Enderby authored
where the symbolic operand's displacement was incorrectly shifted left by 1. rdar://11387046 llvm-svn: 156212
-
- May 04, 2012
-
-
Chandler Carruth authored
In file included from ../lib/Target/NVPTX/VectorElementize.cpp:53: ../lib/Target/NVPTX/NVPTX.h:44:3: warning: default label in switch which covers all enumeration values [-Wcovered-switch-default] default: assert(0 && "Unknown condition code"); ^ 1 warning generated. The prevailing pattern in LLVM is to not use a default label, and instead to use llvm_unreachable to denote that the switch in fact covers all return paths from the function. llvm-svn: 156209
-
Chandler Carruth authored
RegionInfo's RegionNode. This mirrors the logic for automating the extraction from a Loop. llvm-svn: 156208
-
Chandler Carruth authored
add a new Region::block_iterator which actually iterates over the basic blocks of the region. The old iterator, now call 'block_node_iterator' iterates over RegionNodes which contain a single basic block. This works well with the GraphTraits-based iterator design, however most users actually want an iterator over the BasicBlocks inside these RegionNodes. Now the 'block_iterator' is a wrapper which exposes exactly this interface. Internally it uses the block_node_iterator to walk all nodes which are single basic blocks, but transparently unwraps the basic block to make user code simpler. While this patch is a bit of a wash, most of the updates are to internal users, not external users of the RegionInfo. I have an accompanying patch to Polly that is a strict simplification of every user of this interface, and I'm working on a pass that also wants the same simplified interface. This patch alone should have no functional impact. llvm-svn: 156202
-
Justin Holewinski authored
This patch adds a new NVPTX back-end to LLVM which supports code generation for NVIDIA PTX 3.0. This back-end will (eventually) replace the current PTX back-end, while maintaining compatibility with it. The new target machines are: nvptx (old ptx32) => 32-bit PTX nvptx64 (old ptx64) => 64-bit PTX The sources are based on the internal NVIDIA NVPTX back-end, and contain more functionality than the current PTX back-end currently provides. NV_CONTRIB llvm-svn: 156196
-
Sebastian Pop authored
Added missing CMN case in Thumb2SizeReduction pass so that LLVM emits 16-bits encoding of CMN instructions. llvm-svn: 156195
-
Preston Gurd authored
llvm-svn: 156194
-
Matt Beaumont-Gay authored
llvm-svn: 156189
-
Chandler Carruth authored
of the CodeExtractor utility. This allows speculatively computing input and output sets to measure the likely size impact of the code extraction. These sets cannot be reused sadly -- we mutate the function prior to forming the final sets used by the actual extraction. The interface has been revamped slightly to make it easier to use correctly by making the interface const and sinking the computation of the number of exit blocks into the full extraction function and away from the rest of this logic which just computed two output parameters. llvm-svn: 156168
-
Chandler Carruth authored
blocks, assert that this doesn't happen. We don't want to bother trying to support this call pattern as it isn't necessary. llvm-svn: 156167
-
Chandler Carruth authored
detect an in-eligible block rather than just breaking out of the loop. llvm-svn: 156166
-
Chandler Carruth authored
of the extractor itself. llvm-svn: 156164
-
Chandler Carruth authored
and expose it as a utility class rather than as free function wrappers. The simple free-function interface works well for the bugpoint-specific pass's uses of code extraction, but in an upcoming patch for more advanced code extraction, they simply don't expose a rich enough interface. I need to expose various stages of the process of doing the code extraction and query information to decide whether or not to actually complete the extraction or give up. Rather than build up a new predicate model and pass that into these functions, just take the class that was actually implementing the functions and lift it up into a proper interface that can be used to perform code extraction. The interface is cleaned up and re-documented to work better in a header. It also is now setup to accept the blocks to be extracted in the constructor rather than in a method. In passing this essentially reverts my previous commit here exposing a block-level query for eligibility of extraction. That is no longer necessary with the more rich interface as clients can query the extraction object for eligibility directly. This will reduce the number of walks of the input basic block sequence by quite a bit which is useful if this enters the normal optimization pipeline. llvm-svn: 156163
-
Hans Wennborg authored
This moves the logic for selecting a TLS model to a single place, instead of the previous three (ARM, Mips, and X86 which already uses this function). llvm-svn: 156162
-
Craig Topper authored
llvm-svn: 156159
-
Craig Topper authored
llvm-svn: 156158
-
Craig Topper authored
llvm-svn: 156157
-
Craig Topper authored
llvm-svn: 156156
-
Bill Wendling authored
Also combine the code in the 'assert' statement. llvm-svn: 156155
-
Craig Topper authored
llvm-svn: 156154
-
Jakob Stoklund Olesen authored
This information in now computed by TableGen. llvm-svn: 156152
-
Jakob Stoklund Olesen authored
This manually enumerated list of super-register classes has been superceeded by the automatically computed super-register class masks available through SuperRegClassIterator. llvm-svn: 156151
-
Rafael Espindola authored
using cmake+ninja, since ninja buffers the compiler output. llvm-svn: 156150
-
Jakob Stoklund Olesen authored
The masks returned by SuperRegClassIterator are computed automatically by TableGen. This is better than depending on the manually specified SuperRegClasses. llvm-svn: 156147
-
Jakob Stoklund Olesen authored
The TargetLowering construction needs to use a valid TargetRegisterInfo instance. llvm-svn: 156146
-
Jakob Stoklund Olesen authored
This iterator class provides a more abstract interface to the (Idx, Mask) lists of super-registers for a register class. The layout of the tables shouldn't be exposed to clients. llvm-svn: 156144
-
Chandler Carruth authored
minor behavior changes with this, but nothing I have seen evidence of in the wild or expect to be meaningful. The real goal is unifying our logic and simplifying the interfaces. A summary of the changes follows: - Make 'callIsSmall' actually accept a callsite so it can handle intrinsics, and simplify callers appropriately. - Nuke a completely bogus declaration of 'callIsSmall' that was still lurking in InlineCost.h... No idea how this got missed. - Teach the 'isInstructionFree' about the various more intelligent 'free' heuristics that got added to the inline cost analysis during review and testing. This mostly surrounds int->ptr and ptr->int casts. - Switch most of the interesting parts of the inline cost analysis that were essentially computing 'is this instruction free?' to use the code metrics routine instead. This way we won't keep duplicating logic. All of this is motivated by the desire to allow other passes to compute a roughly equivalent 'cost' metric for a particular basic block as the inline cost analysis. Sadly, re-using the same analysis for both is really messy because only the actual inline cost analysis is ever going to go to the contortions required for simplification, SROA analysis, etc. llvm-svn: 156140
-
Chandler Carruth authored
but using a FoldingSet underneath and with a largely compatible interface to that of FoldingSet. This can be used anywhere a FoldingSet would be natural, but iteration order is significant. The initial intended use case is in Clang's template specialization lists to preserve instantiation order iteration. llvm-svn: 156131
-