Commits · ede4a8aa2be346d4f175e61cf4a18a023ea3198d · Roger Ferrer / llvm-epi-0.8

Apr 08, 2012

Teach LLVM about a PIE option which, when enabled on top of PIC, makes · ede4a8aa

Chandler Carruth authored Apr 08, 2012

optimizations which are valid for position independent code being linked
into a single executable, but not for such code being linked into
a shared library.

I discussed the design of this with Eric Christopher, and the decision
was to support an optional bit rather than a completely separate
relocation model. Fundamentally, this is still PIC relocation, its just
that certain optimizations are only valid under a PIC relocation model
when the resulting code won't be in a shared library. The simplest path
to here is to expose a single bit option in the TargetOptions. If folks
have different/better designs, I'm all ears. =]

I've included the first optimization based upon this: changing TLS
models to the *Exec models when PIE is enabled. This is the LLVM
component of PR12380 and is all of the hard work.

llvm-svn: 154294

ede4a8aa

Move the TLSModel information into the TargetMachine rather than hiding · 16f0ebcb

Chandler Carruth authored Apr 08, 2012

in TargetLowering. There was already a FIXME about this location being
odd. The interface is simplified as a consequence. This will also make
it easier to change TLS models when compiling with PIE.

llvm-svn: 154292

16f0ebcb

EngineBuilder::create is expected to take ownership of the TargetMachine... · 25a3d816

Benjamin Kramer authored Apr 08, 2012

EngineBuilder::create is expected to take ownership of the TargetMachine passed to it. Delete it on error or when we create an interpreter that doesn't need it.

llvm-svn: 154288

25a3d816

Remove an over zealous assert. The assert was trying to catch places · bed1abf9

Chandler Carruth authored Apr 08, 2012

where a chain outside of the loop block-set ended up in the worklist for
scheduling as part of the contiguous loop. However, asserting the first
block in the chain is in the loop-set isn't a valid check -- we may be
forced to drag a chain into the worklist due to one block in the chain
being part of the loop even though the first block is *not* in the loop.
This occurs when we have been forced to form a chain early due to
un-analyzable branches.

No test case here as I have no idea how to even begin reducing one, and
it will be hopelessly fragile. We have to somehow end up with a loop
header of an inner loop which is a successor of a basic block with an
unanalyzable pair of branch instructions. Ow. Self-host triggers it so
it is unlikely it will regress.

This at least gets block placement back to passing selfhost and the test
suite. There are still a lot of slowdown that I don't like coming out of
block placement, although there are now also a lot of speedups. =[ I'm
seeing swings in both directions up to 10%. I'm going to try to find
time to dig into this and see if we can turn this on for 3.1 as it does
a really good job of cleaning up after some loops that degraded with the
inliner changes.

llvm-svn: 154287

bed1abf9

Add a debug-only 'dump' method to the BlockChain structure to ease · 49158908
Chandler Carruth authored Apr 08, 2012
```
debugging.

llvm-svn: 154286
```
49158908

Teach InstCombine to nuke a common alloca pattern -- an alloca which has · f82b0e2d

Chandler Carruth authored Apr 08, 2012

GEPs, bit casts, and stores reaching it but no other instructions. These
often show up during the iterative processing of the inliner, SROA, and
DCE. Once we hit this point, we can completely remove the alloca. These
were actually showing up in the final, fully optimized code in a bunch
of inliner tests I've been working on, and notably they show up after
LLVM finishes optimizing away all function calls involved in
hash_combine(a, b).

llvm-svn: 154285

f82b0e2d

AVX2: Build splat vectors by broadcasting a scalar from the constant pool. · 82609df6

Nadav Rotem authored Apr 08, 2012

Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.

llvm-svn: 154284

82609df6

Remove old 'grep' lines. · 8c783d41
Bill Wendling authored Apr 08, 2012
```
llvm-svn: 154283
```
8c783d41
Formatting changes. Don't put spaces in front of some code, which only makes it look 'off'. · ccf11090
Bill Wendling authored Apr 08, 2012
```
llvm-svn: 154282
```
ccf11090
FileCheckize these testcases. · 57f8e5eb
Bill Wendling authored Apr 08, 2012
```
llvm-svn: 154281
```
57f8e5eb

Remove the 'Parent' pointer from the MDNodeOperand class. · 5c0068f8

Bill Wendling authored Apr 08, 2012

An MDNode has a list of MDNodeOperands allocated directly after it as part of
its allocation. Therefore, the Parent of the MDNodeOperands can be found by
walking back through the operands to the beginning of that list. Mark the first
operand's value pointer as being the 'first' operand so that we know where the
beginning of said list is.

This saves a *lot* of space during LTO with -O0 -g flags.

llvm-svn: 154280

5c0068f8

Allow subclasses of the ValueHandleBase to store information as part of the · 9b2503a0
Bill Wendling authored Apr 08, 2012
```
value pointer by making the value pointer into a pointer-int pair with 2 bits
available for flags.

llvm-svn: 154279
```
9b2503a0

Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and... · d024cef2

Craig Topper authored Apr 07, 2012

Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove patterns for selecting the intrinsic. Similar was already done for avx1.

llvm-svn: 154272

d024cef2

Apr 07, 2012

Move vinsertf128 patterns near the instruction definitions. Add... · aa9aab5a

Craig Topper authored Apr 07, 2012

Move vinsertf128 patterns near the instruction definitions. Add AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns.

llvm-svn: 154268

aa9aab5a

Remove 'else' after 'if' that ends in return. · e09d1c5c
Craig Topper authored Apr 07, 2012
```
llvm-svn: 154267
```
e09d1c5c

1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new · 71d07ae5

Nadav Rotem authored Apr 07, 2012

   shuffle node because it could introduce new shuffle nodes that were not
   supported efficiently by the target.

2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
   second shuffle reverses the transformation of the first shuffle.

llvm-svn: 154266

71d07ae5

Convert floating point division by a constant into multiplication by the · 5f8397a9

Duncan Sands authored Apr 07, 2012

reciprocal if converting to the reciprocal is exact.  Do it even if inexact
if -ffast-math.  This substantially speeds up ac.f90 from the polyhedron
benchmarks.

llvm-svn: 154265

5f8397a9

Perform partial SROA on the helper hashing structure. I really wish the · 75a1cf32

Chandler Carruth authored Apr 07, 2012

optimizers could do this for us, but expecting partial SROA of classes
with template methods through cloning is probably expecting too much
heroics. With this change, the begin/end pointer pairs which indicate
the status of each loop iteration are actually passed directly into each
layer of the combine_data calls, and the inliner has a chance to see
when most of the combine_data function could be deleted by inlining.
Similarly for 'length'.

We have to be careful to limit the places where in/out reference
parameters are used as those will also defeat the inliner / optimizers
from properly propagating constants.

With this change, LLVM is able to fully inline and unroll the hash
computation of small sets of values, such as two or three pointers.
These now decompose into essentially straight-line code with no loops or
function calls.

There is still one code quality problem to be solved with the hashing --
LLVM is failing to nuke the alloca. It removes all loads from the
alloca, leaving only lifetime intrinsics and dead(!!) stores to the
alloca. =/ Very unfortunate.

llvm-svn: 154264

75a1cf32

Fix ValueTracking to conclude that debug intrinsics are safe to · 28192c93

Chandler Carruth authored Apr 07, 2012

speculate. Without this, loop rotate (among many other places) would
suddenly stop working in the presence of debug info. I found this
looking at loop rotate, and have augmented its tests with a reduction
out of a very hot loop in yacr2 where failing to do this rotation costs
sometimes more than 10% in runtime performance, perturbing numerous
downstream optimizations.

This should have no impact on performance without debug info, but the
change in performance when debug info is enabled can be extreme. As
a consequence (and this how I got to this yak) any profiling of
performance problems should be treated with deep suspicion -- they may
have been wildly innacurate of debug info was enabled for profiling. =/
Just a heads up.

llvm-svn: 154263

28192c93

SCEV: When expanding a GEP the final addition to the base pointer has NUW but not NSW. · e1f4ca1b
Benjamin Kramer authored Apr 07, 2012
```
Found by inspection.

llvm-svn: 154262
```
e1f4ca1b

Fix Thumb __builtin_longjmp with integrated assembler. <rdar://problem/11203543 > · 6f9be7e2

Bob Wilson authored Apr 07, 2012

The tLDRr instruction with the last register operand set to the zero register
prints in assembly as if no register was specified, and the assembler encodes
it as a tLDRi instruction with a zero immediate. With the integrated assembler,
that zero register gets emitted as "r0", so we get "ldr rx, [ry, r0]" which
is broken. Emit the instruction as tLDRi with a zero immediate. I don't
know if there's a good way to write a testcase for this. Suggestions welcome.

Opportunities for follow-up work:
1) The asm printer should complain if a non-optional register operand is set
to the zero register, instead of silently dropping it.
2) The integrated assembler should complain in the same situation, instead of
silently emitting the operand as "r0".

llvm-svn: 154261

6f9be7e2

Refactor: Use positive field names in VectorizeConfig. · 5758f495
Hongbin Zheng authored Apr 07, 2012
```
llvm-svn: 154249
```
5758f495

Target/X86/MCTargetDesc/X86MCAsmInfo.cpp: Enable DwarfCFI (aka DW2) on Cygming. · b95f6413

NAKAMURA Takumi authored Apr 07, 2012

Cygwin-1.7 supports dw2. Some recent mingw distros support one, too.
I have confirmed test-suite/SingleSource/Benchmarks/Shootout-C++/except.cpp can pass on Cygwin.

llvm-svn: 154247

b95f6413

Make the test for r154235 more platform-independent with a shorter · 78fce432
Alexis Hunt authored Apr 07, 2012
```
string.

llvm-svn: 154243
```
78fce432

Output UTF-8-encoded characters as identifier characters into assembly · 0235f684

Alexis Hunt authored Apr 07, 2012

by default.

This is a behaviour configurable in the MCAsmInfo. I've decided to turn
it on by default in (possibly optimistic) hopes that most assemblers are
reasonably sane. If this proves a problem, switching to default seems
reasonable.

I'm not sure if this is the opportune place to test, but it seemed good
to make sure it was tested somewhere.

llvm-svn: 154235

0235f684

Tidy up. 80 columns. · 0c509fa6
Jim Grosbach authored Apr 06, 2012
```
llvm-svn: 154226
```
0c509fa6

Apr 06, 2012

ARMPat is equivalent to Requires<[IsARM]>. · baa35660
Jakob Stoklund Olesen authored Apr 06, 2012
```
llvm-svn: 154210
```
baa35660

Eliminate iOS-specific tail call instructions. · b4bd3880

Jakob Stoklund Olesen authored Apr 06, 2012

After register masks were introdruced to represent the call clobbers, it
is no longer necessary to have duplicate instruction for iOS.

llvm-svn: 154209

b4bd3880

Add lines in global-address.ll to test N32 and N64 code generation. · 487e5676
Akira Hatanaka authored Apr 06, 2012
```
llvm-svn: 154202
```
487e5676
There is no portable std::abs overload for int64_t, use the llvm::abs64 · 8a102c21
Chandler Carruth authored Apr 06, 2012
```
which exists for this purpose.

llvm-svn: 154199
```
8a102c21

Fixed two leaks in the MC disassembler. The MC · e804b5b7

Sean Callanan authored Apr 06, 2012

disassembler requires a MCSubtargetInfo and a
MCInstrInfo to exist in order to initialize the
instruction printer and disassembler; however,
although the printer and disassembler keep
references to these objects they do not own them.
Previously, the MCSubtargetInfo and MCInstrInfo
objects were just leaked.

I have extended LLVMDisasmContext to own these
objects and delete them when it is destroyed.

llvm-svn: 154192

e804b5b7

Allow negative immediates in ARM and Thumb2 compares. · 967b86a0

Jakob Stoklund Olesen authored Apr 06, 2012

ARM and Thumb2 mode can use cmn instructions to compare against negative
immediates. Thumb1 mode can't.

llvm-svn: 154183

967b86a0

Reintroduce InlineCostAnalyzer::getInlineCost() variant with explicit callee · c1c9cdab
David Chisnall authored Apr 06, 2012
```
parameter until we have a more sensible API for doing the same thing.

Reviewed by Chandler.

llvm-svn: 154180
```
c1c9cdab

Sink the collection of return instructions until after *all* · 49da9339

Chandler Carruth authored Apr 06, 2012

simplification has been performed. This is a bit less efficient
(requires another ilist walk of the basic blocks) but shouldn't matter
in practice. More importantly, it's just too much work to keep track of
all the various ways the return instructions can be mutated while
simplifying them. This fixes yet another crasher, reported by Daniel
Dunbar.

llvm-svn: 154179

49da9339

Tweak this test to ensure the inliner did indeed fire. Thanks to Richard · e547fefc
Chandler Carruth authored Apr 06, 2012
```
Smith for pointing this out in review.

llvm-svn: 154178
```
e547fefc
Make GVN's propagateEquality non-recursive. No intended functionality change. · d12b18f8
Duncan Sands authored Apr 06, 2012
```
The modifications are a lot more trivial than they appear to be in the diff!

llvm-svn: 154174
```
d12b18f8
Test case for PR12413 · bdc9f071
Craig Topper authored Apr 06, 2012
```
llvm-svn: 154172
```
bdc9f071
Fix narrowing conversion. · 3cacabfb
Benjamin Kramer authored Apr 06, 2012
```
llvm-svn: 154171
```
3cacabfb

DenseMap: Perform the pod-like object optimization when the value type is... · 15e21a15

Benjamin Kramer authored Apr 06, 2012

DenseMap: Perform the pod-like object optimization when the value type is POD-like, not the DenseMapInfo for it.

Purge now unused template arguments. This has been broken since r91421. Patch by Lubos Lunak!

llvm-svn: 154170

15e21a15

Allow 256-bit shuffles to be split if a 128-bit lane contains elements from a... · 447417c9

Craig Topper authored Apr 06, 2012

Allow 256-bit shuffles to be split if a 128-bit lane contains elements from a single source. This is a rewrite of the 256-bit shuffle splitting code based on similar code from legalize types. Fixes PR12413.

llvm-svn: 154166

447417c9