Commits · 407a85e7a6c889ea4f561b7b0daee30d4bb0d865 · Roger Ferrer / llvm-epi-0.8

May 05, 2012

[build] Add build check for ::arc4random(). · 407a85e7
Daniel Dunbar authored May 05, 2012
```
llvm-svn: 156236
```
407a85e7
Update all outdated autoconf files in the sample project. · 6764af97
Benjamin Kramer authored May 05, 2012
```
We might just use symlinks here, but I'm afraid of possible portability issues.

llvm-svn: 156235
```
6764af97

CodeGenPrepare: Add a transform to turn selects into branches in some cases. · 047d7ca0

Benjamin Kramer authored May 05, 2012

This came up when a change in block placement formed a cmov and slowed down a
hot loop by 50%:

	ucomisd	(%rdi), %xmm0
	cmovbel	%edx, %esi

cmov is a really bad choice in this context because it doesn't get branch
prediction. If we emit it as a branch, an out-of-order CPU can do a better job
(if the branch is predicted right) and avoid waiting for the slow load+compare
instruction to finish. Of course it won't help if the branch is unpredictable,
but those are really rare in practice.

This patch uses a dumb conservative heuristic, it turns all cmovs that have one
use and a direct memory operand into branches. cmovs usually save some code
size, so we disable the transform in -Os mode. In-Order architectures are
unlikely to benefit as well, those are included in the
"predictableSelectIsExpensive" flag.

It would be better to reuse branch probability info here, but BPI doesn't
support select instructions currently. It would make sense to use the same
heuristics as the if-converter pass, which does the opposite direction of this
transform.


Test suite shows a small improvement here and there on corei7-level machines,
but the actual results depend a lot on the used microarchitecture. The
transformation is currently disabled by default and available by passing the
-enable-cgp-select2branch flag to the code generator.

Thanks to Chandler for the initial test case to him and Evan Cheng for providing
me with comments and test-suite numbers that were more stable than mine :)

llvm-svn: 156234

047d7ca0

Add a new target hook "predictableSelectIsExpensive". · e31f31e5

Benjamin Kramer authored May 05, 2012

This will be used to determine whether it's profitable to turn a select into a
branch when the branch is likely to be predicted.

Currently enabled for everything but Atom on X86 and Cortex-A9 devices on ARM.

I'm not entirely happy with the name of this flag, suggestions welcome ;)

llvm-svn: 156233

e31f31e5

NVPTX: Initialize the UseF32FTZ flag. · a25a61b9
Benjamin Kramer authored May 05, 2012
```
llvm-svn: 156232
```
a25a61b9

Small fix in InstCombineCasts.cpp. Restored "alloca + bitcast" reducing for... · cb2a1a34

Stepan Dyatkovskiy authored May 05, 2012

Small fix in InstCombineCasts.cpp. Restored "alloca + bitcast" reducing for case when alloca's size is calculated within the "add/sub/... nsw".
Also added fix to 2011-06-13-nsw-alloca.ll test.

llvm-svn: 156231

cb2a1a34

Typo. · de9e92ed
Eric Christopher authored May 05, 2012
```
llvm-svn: 156226
```
de9e92ed

Order register classes by spill size first, members last. · 4fd600b6

Jakob Stoklund Olesen authored May 04, 2012

This is still a topological ordering such that every register class gets
a smaller enum value than its sub-classes.

Placing the smaller spill sizes first makes a difference for the
super-register class bit masks. When looking for a super-register class,
we usually want the smallest possible kind of super-register. That is
now available as the first bit set in the bit mask.

llvm-svn: 156222

4fd600b6

Make sure findRepresentativeClass picks the widest super-register. · e326ed33

Jakob Stoklund Olesen authored May 04, 2012

We want the representative register class to contain the largest
super-registers available. This makes the function less sensitive to the
register class numbering.

llvm-svn: 156220

e326ed33

Remove extra comma in debug output. · e89496fe
Jakob Stoklund Olesen authored May 04, 2012
```
llvm-svn: 156219
```
e89496fe

Fix warnings in release build. · 891d0a3d

David Blaikie authored May 04, 2012

This fixes a couple of Clang warnings in release builds of LLVM:

* Missing return in ISelLowering
* Unused variable in NVPTXutil.cpp

llvm-svn: 156216

891d0a3d

Tweak to the fix in r156212, as with the change in removing the shift the · cabbae65
Kevin Enderby authored May 04, 2012
```
SignExtend32<22>(Val<<1) also needs to change to SignExtend32<21>(Val) .

llvm-svn: 156213
```
cabbae65
Fix a bug in the ARM disassembler for wide branch conditional instructions · 8ce1ada1
Kevin Enderby authored May 04, 2012
```
where the symbolic operand's displacement was incorrectly shifted left by 1.
rdar://11387046

llvm-svn: 156212
```
8ce1ada1

May 04, 2012

Fix a Clang warning in the new NVPTX backend: · cd3464ee

Chandler Carruth authored May 04, 2012

In file included from ../lib/Target/NVPTX/VectorElementize.cpp:53:
../lib/Target/NVPTX/NVPTX.h:44:3: warning: default label in switch which covers all enumeration values [-Wcovered-switch-default]
  default: assert(0 && "Unknown condition code");
  ^
1 warning generated.

The prevailing pattern in LLVM is to not use a default label, and instead to
use llvm_unreachable to denote that the switch in fact covers all return paths
from the function.

llvm-svn: 156209

cd3464ee

Teach the code extractor how to extract a sequence of blocks from · 6781821c
Chandler Carruth authored May 04, 2012
```
RegionInfo's RegionNode. This mirrors the logic for automating the
extraction from a Loop.

llvm-svn: 156208
```
6781821c

Rename the Region::block_iterator to Region::block_node_iterator, and · 8880325a

Chandler Carruth authored May 04, 2012

add a new Region::block_iterator which actually iterates over the basic
blocks of the region.

The old iterator, now call 'block_node_iterator' iterates over
RegionNodes which contain a single basic block. This works well with the
GraphTraits-based iterator design, however most users actually want an
iterator over the BasicBlocks inside these RegionNodes. Now the
'block_iterator' is a wrapper which exposes exactly this interface.
Internally it uses the block_node_iterator to walk all nodes which are
single basic blocks, but transparently unwraps the basic block to make
user code simpler.

While this patch is a bit of a wash, most of the updates are to internal
users, not external users of the RegionInfo. I have an accompanying
patch to Polly that is a strict simplification of every user of this
interface, and I'm working on a pass that also wants the same simplified
interface.

This patch alone should have no functional impact.

llvm-svn: 156202

8880325a

This patch adds a new NVPTX back-end to LLVM which supports code generation... · ae556d3e

Justin Holewinski authored May 04, 2012

This patch adds a new NVPTX back-end to LLVM which supports code generation for NVIDIA PTX 3.0. This back-end will (eventually) replace the current PTX back-end, while maintaining compatibility with it.

The new target machines are:

nvptx (old ptx32) => 32-bit PTX
nvptx64 (old ptx64) => 64-bit PTX

The sources are based on the internal NVIDIA NVPTX back-end, and
contain more functionality than the current PTX back-end currently
provides.

NV_CONTRIB

llvm-svn: 156196

ae556d3e

Added missing CMN case in Thumb2SizeReduction pass so that LLVM emits 16-bits... · 2420e8b7
Sebastian Pop authored May 04, 2012
```
Added missing CMN case in Thumb2SizeReduction pass so that LLVM emits 16-bits encoding of CMN instructions.

llvm-svn: 156195
```
2420e8b7
Adds Intel Atom scheduling latencies to X86InstrSystem.td. · d6c440cd
Preston Gurd authored May 04, 2012
```
llvm-svn: 156194
```
d6c440cd
Pacify GCC's -Wreturn-type · e82ab6ba
Matt Beaumont-Gay authored May 04, 2012
```
llvm-svn: 156189
```
e82ab6ba

Factor the computation of input and output sets into a public interface · 14316fcf

Chandler Carruth authored May 04, 2012

of the CodeExtractor utility. This allows speculatively computing input
and output sets to measure the likely size impact of the code
extraction.

These sets cannot be reused sadly -- we mutate the function prior to
forming the final sets used by the actual extraction.

The interface has been revamped slightly to make it easier to use
correctly by making the interface const and sinking the computation of
the number of exit blocks into the full extraction function and away
from the rest of this logic which just computed two output parameters.

llvm-svn: 156168

14316fcf

Rather than trying to gracefully handle input sequences with repeated · 44e13911

Chandler Carruth authored May 04, 2012

blocks, assert that this doesn't happen. We don't want to bother trying
to support this call pattern as it isn't necessary.

llvm-svn: 156167

44e13911

Fix a goof with my previous commit by completely returning when we · 0a570552
Chandler Carruth authored May 04, 2012
```
detect an in-eligible block rather than just breaking out of the loop.

llvm-svn: 156166
```
0a570552
Hoist a safety assert from the extraction method into the construction · 2f5d0191
Chandler Carruth authored May 04, 2012
```
of the extractor itself.

llvm-svn: 156164
```
2f5d0191

Move the CodeExtractor utility to a dedicated header file / source file, · 0fde0015

Chandler Carruth authored May 04, 2012

and expose it as a utility class rather than as free function wrappers.

The simple free-function interface works well for the bugpoint-specific
pass's uses of code extraction, but in an upcoming patch for more
advanced code extraction, they simply don't expose a rich enough
interface. I need to expose various stages of the process of doing the
code extraction and query information to decide whether or not to
actually complete the extraction or give up.

Rather than build up a new predicate model and pass that into these
functions, just take the class that was actually implementing the
functions and lift it up into a proper interface that can be used to
perform code extraction. The interface is cleaned up and re-documented
to work better in a header. It also is now setup to accept the blocks to
be extracted in the constructor rather than in a method.

In passing this essentially reverts my previous commit here exposing
a block-level query for eligibility of extraction. That is no longer
necessary with the more rich interface as clients can query the
extraction object for eligibility directly. This will reduce the number
of walks of the input basic block sequence by quite a bit which is
useful if this enters the normal optimization pipeline.

llvm-svn: 156163

0fde0015

Make ARM and Mips use TargetMachine::getTLSModel() · aea41200

Hans Wennborg authored May 04, 2012

This moves the logic for selecting a TLS model to a single place,
instead of the previous three (ARM, Mips, and X86 which already
uses this function).

llvm-svn: 156162

aea41200

Fix some loops to match coding standards. No functional change intended. · bdd2e34b
Craig Topper authored May 04, 2012
```
llvm-svn: 156159
```
bdd2e34b
Fix up some spacing. No functional change. · d4d3237b
Craig Topper authored May 04, 2012
```
llvm-svn: 156158
```
d4d3237b
Simplify broadcast lowering code. No functional change intended. · e2ae4137
Craig Topper authored May 04, 2012
```
llvm-svn: 156157
```
e2ae4137
Allow v16i16 and v32i8 shuffles to be rewritten as narrower shuffles. · 42f21823
Craig Topper authored May 04, 2012
```
llvm-svn: 156156
```
42f21823
Add 'landingpad' instructions to the list of instructions to ignore. · fa0ebcd1
Bill Wendling authored May 04, 2012
```
Also combine the code in the 'assert' statement.

llvm-svn: 156155
```
fa0ebcd1
Simplify shuffle narrowing code a bit. No functional change intended. · 59063c0a
Craig Topper authored May 04, 2012
```
llvm-svn: 156154
```
59063c0a
Remove the SubRegClasses field from RegisterClass descriptions. · 796e5272
Jakob Stoklund Olesen authored May 04, 2012
```
This information in now computed by TableGen.

llvm-svn: 156152
```
796e5272

Remove TargetRegisterClass::SuperRegClasses. · 3f6faaec

Jakob Stoklund Olesen authored May 04, 2012

This manually enumerated list of super-register classes has been
superceeded by the automatically computed super-register class masks
available through SuperRegClassIterator.

llvm-svn: 156151

3f6faaec

Pass -fcolor-diagnostics when it is supported. This makes a difference when · 1abcf642
Rafael Espindola authored May 04, 2012
```
using cmake+ninja, since ninja buffers the compiler output.

llvm-svn: 156150
```
1abcf642

Use SuperRegClassIterator for findRepresentativeClass(). · 75fbe908

Jakob Stoklund Olesen authored May 04, 2012

The masks returned by SuperRegClassIterator are computed automatically
by TableGen. This is better than depending on the manually specified
SuperRegClasses.

llvm-svn: 156147

75fbe908

Initialize SparcInstrInfo before SparcTargetLowering. · 34a8f13e
Jakob Stoklund Olesen authored May 04, 2012
```
The TargetLowering construction needs to use a valid TargetRegisterInfo
instance.

llvm-svn: 156146
```
34a8f13e

Add a SuperRegClassIterator class. · 57c70506

Jakob Stoklund Olesen authored May 04, 2012

This iterator class provides a more abstract interface to the (Idx,
Mask) lists of super-registers for a register class. The layout of the
tables shouldn't be exposed to clients.

llvm-svn: 156144

57c70506

A pile of long over-due refactorings here. There are some very, *very* · da7513a8

Chandler Carruth authored May 04, 2012

minor behavior changes with this, but nothing I have seen evidence of in
the wild or expect to be meaningful. The real goal is unifying our logic
and simplifying the interfaces. A summary of the changes follows:

- Make 'callIsSmall' actually accept a callsite so it can handle
  intrinsics, and simplify callers appropriately.
- Nuke a completely bogus declaration of 'callIsSmall' that was still
  lurking in InlineCost.h... No idea how this got missed.
- Teach the 'isInstructionFree' about the various more intelligent
  'free' heuristics that got added to the inline cost analysis during
  review and testing. This mostly surrounds int->ptr and ptr->int casts.
- Switch most of the interesting parts of the inline cost analysis that
  were essentially computing 'is this instruction free?' to use the code
  metrics routine instead. This way we won't keep duplicating logic.

All of this is motivated by the desire to allow other passes to compute
a roughly equivalent 'cost' metric for a particular basic block as the
inline cost analysis. Sadly, re-using the same analysis for both is
really messy because only the actual inline cost analysis is ever going
to go to the contortions required for simplification, SROA analysis,
etc.

llvm-svn: 156140

da7513a8

Add a FoldingSetVector datastructure which is analogous to a SetVector, · 45a5b5eb

Chandler Carruth authored May 03, 2012

but using a FoldingSet underneath and with a largely compatible
interface to that of FoldingSet. This can be used anywhere a FoldingSet
would be natural, but iteration order is significant. The initial
intended use case is in Clang's template specialization lists to
preserve instantiation order iteration.

llvm-svn: 156131

45a5b5eb