Commits · 16f0ebcbb576f197422426b64f4ad54269f273d1 · Roger Ferrer / llvm-epi-0.8

Apr 08, 2012

Move the TLSModel information into the TargetMachine rather than hiding · 16f0ebcb

Chandler Carruth authored Apr 08, 2012

in TargetLowering. There was already a FIXME about this location being
odd. The interface is simplified as a consequence. This will also make
it easier to change TLS models when compiling with PIE.

llvm-svn: 154292

16f0ebcb

EngineBuilder::create is expected to take ownership of the TargetMachine... · 25a3d816

Benjamin Kramer authored Apr 08, 2012

EngineBuilder::create is expected to take ownership of the TargetMachine passed to it. Delete it on error or when we create an interpreter that doesn't need it.

llvm-svn: 154288

25a3d816

Remove an over zealous assert. The assert was trying to catch places · bed1abf9

Chandler Carruth authored Apr 08, 2012

where a chain outside of the loop block-set ended up in the worklist for
scheduling as part of the contiguous loop. However, asserting the first
block in the chain is in the loop-set isn't a valid check -- we may be
forced to drag a chain into the worklist due to one block in the chain
being part of the loop even though the first block is *not* in the loop.
This occurs when we have been forced to form a chain early due to
un-analyzable branches.

No test case here as I have no idea how to even begin reducing one, and
it will be hopelessly fragile. We have to somehow end up with a loop
header of an inner loop which is a successor of a basic block with an
unanalyzable pair of branch instructions. Ow. Self-host triggers it so
it is unlikely it will regress.

This at least gets block placement back to passing selfhost and the test
suite. There are still a lot of slowdown that I don't like coming out of
block placement, although there are now also a lot of speedups. =[ I'm
seeing swings in both directions up to 10%. I'm going to try to find
time to dig into this and see if we can turn this on for 3.1 as it does
a really good job of cleaning up after some loops that degraded with the
inliner changes.

llvm-svn: 154287

bed1abf9

Add a debug-only 'dump' method to the BlockChain structure to ease · 49158908
Chandler Carruth authored Apr 08, 2012
```
debugging.

llvm-svn: 154286
```
49158908

Teach InstCombine to nuke a common alloca pattern -- an alloca which has · f82b0e2d

Chandler Carruth authored Apr 08, 2012

GEPs, bit casts, and stores reaching it but no other instructions. These
often show up during the iterative processing of the inliner, SROA, and
DCE. Once we hit this point, we can completely remove the alloca. These
were actually showing up in the final, fully optimized code in a bunch
of inliner tests I've been working on, and notably they show up after
LLVM finishes optimizing away all function calls involved in
hash_combine(a, b).

llvm-svn: 154285

f82b0e2d

AVX2: Build splat vectors by broadcasting a scalar from the constant pool. · 82609df6

Nadav Rotem authored Apr 08, 2012

Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.

llvm-svn: 154284

82609df6

Remove the 'Parent' pointer from the MDNodeOperand class. · 5c0068f8

Bill Wendling authored Apr 08, 2012

An MDNode has a list of MDNodeOperands allocated directly after it as part of
its allocation. Therefore, the Parent of the MDNodeOperands can be found by
walking back through the operands to the beginning of that list. Mark the first
operand's value pointer as being the 'first' operand so that we know where the
beginning of said list is.

This saves a *lot* of space during LTO with -O0 -g flags.

llvm-svn: 154280

5c0068f8

Allow subclasses of the ValueHandleBase to store information as part of the · 9b2503a0
Bill Wendling authored Apr 08, 2012
```
value pointer by making the value pointer into a pointer-int pair with 2 bits
available for flags.

llvm-svn: 154279
```
9b2503a0

Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and... · d024cef2

Craig Topper authored Apr 07, 2012

Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove patterns for selecting the intrinsic. Similar was already done for avx1.

llvm-svn: 154272

d024cef2

Apr 07, 2012

Move vinsertf128 patterns near the instruction definitions. Add... · aa9aab5a

Craig Topper authored Apr 07, 2012

Move vinsertf128 patterns near the instruction definitions. Add AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns.

llvm-svn: 154268

aa9aab5a

Remove 'else' after 'if' that ends in return. · e09d1c5c
Craig Topper authored Apr 07, 2012
```
llvm-svn: 154267
```
e09d1c5c

1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new · 71d07ae5

Nadav Rotem authored Apr 07, 2012

   shuffle node because it could introduce new shuffle nodes that were not
   supported efficiently by the target.

2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
   second shuffle reverses the transformation of the first shuffle.

llvm-svn: 154266

71d07ae5

Convert floating point division by a constant into multiplication by the · 5f8397a9

Duncan Sands authored Apr 07, 2012

reciprocal if converting to the reciprocal is exact.  Do it even if inexact
if -ffast-math.  This substantially speeds up ac.f90 from the polyhedron
benchmarks.

llvm-svn: 154265

5f8397a9

Fix ValueTracking to conclude that debug intrinsics are safe to · 28192c93

Chandler Carruth authored Apr 07, 2012

speculate. Without this, loop rotate (among many other places) would
suddenly stop working in the presence of debug info. I found this
looking at loop rotate, and have augmented its tests with a reduction
out of a very hot loop in yacr2 where failing to do this rotation costs
sometimes more than 10% in runtime performance, perturbing numerous
downstream optimizations.

This should have no impact on performance without debug info, but the
change in performance when debug info is enabled can be extreme. As
a consequence (and this how I got to this yak) any profiling of
performance problems should be treated with deep suspicion -- they may
have been wildly innacurate of debug info was enabled for profiling. =/
Just a heads up.

llvm-svn: 154263

28192c93

SCEV: When expanding a GEP the final addition to the base pointer has NUW but not NSW. · e1f4ca1b
Benjamin Kramer authored Apr 07, 2012
```
Found by inspection.

llvm-svn: 154262
```
e1f4ca1b

Fix Thumb __builtin_longjmp with integrated assembler. <rdar://problem/11203543 > · 6f9be7e2

Bob Wilson authored Apr 07, 2012

The tLDRr instruction with the last register operand set to the zero register
prints in assembly as if no register was specified, and the assembler encodes
it as a tLDRi instruction with a zero immediate. With the integrated assembler,
that zero register gets emitted as "r0", so we get "ldr rx, [ry, r0]" which
is broken. Emit the instruction as tLDRi with a zero immediate. I don't
know if there's a good way to write a testcase for this. Suggestions welcome.

Opportunities for follow-up work:
1) The asm printer should complain if a non-optional register operand is set
to the zero register, instead of silently dropping it.
2) The integrated assembler should complain in the same situation, instead of
silently emitting the operand as "r0".

llvm-svn: 154261

6f9be7e2

Refactor: Use positive field names in VectorizeConfig. · 5758f495
Hongbin Zheng authored Apr 07, 2012
```
llvm-svn: 154249
```
5758f495

Target/X86/MCTargetDesc/X86MCAsmInfo.cpp: Enable DwarfCFI (aka DW2) on Cygming. · b95f6413

NAKAMURA Takumi authored Apr 07, 2012

Cygwin-1.7 supports dw2. Some recent mingw distros support one, too.
I have confirmed test-suite/SingleSource/Benchmarks/Shootout-C++/except.cpp can pass on Cygwin.

llvm-svn: 154247

b95f6413

Output UTF-8-encoded characters as identifier characters into assembly · 0235f684

Alexis Hunt authored Apr 07, 2012

by default.

This is a behaviour configurable in the MCAsmInfo. I've decided to turn
it on by default in (possibly optimistic) hopes that most assemblers are
reasonably sane. If this proves a problem, switching to default seems
reasonable.

I'm not sure if this is the opportune place to test, but it seemed good
to make sure it was tested somewhere.

llvm-svn: 154235

0235f684

Tidy up. 80 columns. · 0c509fa6
Jim Grosbach authored Apr 06, 2012
```
llvm-svn: 154226
```
0c509fa6

Apr 06, 2012

ARMPat is equivalent to Requires<[IsARM]>. · baa35660
Jakob Stoklund Olesen authored Apr 06, 2012
```
llvm-svn: 154210
```
baa35660

Eliminate iOS-specific tail call instructions. · b4bd3880

Jakob Stoklund Olesen authored Apr 06, 2012

After register masks were introdruced to represent the call clobbers, it
is no longer necessary to have duplicate instruction for iOS.

llvm-svn: 154209

b4bd3880

There is no portable std::abs overload for int64_t, use the llvm::abs64 · 8a102c21
Chandler Carruth authored Apr 06, 2012
```
which exists for this purpose.

llvm-svn: 154199
```
8a102c21

Fixed two leaks in the MC disassembler. The MC · e804b5b7

Sean Callanan authored Apr 06, 2012

disassembler requires a MCSubtargetInfo and a
MCInstrInfo to exist in order to initialize the
instruction printer and disassembler; however,
although the printer and disassembler keep
references to these objects they do not own them.
Previously, the MCSubtargetInfo and MCInstrInfo
objects were just leaked.

I have extended LLVMDisasmContext to own these
objects and delete them when it is destroyed.

llvm-svn: 154192

e804b5b7

Allow negative immediates in ARM and Thumb2 compares. · 967b86a0

Jakob Stoklund Olesen authored Apr 06, 2012

ARM and Thumb2 mode can use cmn instructions to compare against negative
immediates. Thumb1 mode can't.

llvm-svn: 154183

967b86a0

Reintroduce InlineCostAnalyzer::getInlineCost() variant with explicit callee · c1c9cdab
David Chisnall authored Apr 06, 2012
```
parameter until we have a more sensible API for doing the same thing.

Reviewed by Chandler.

llvm-svn: 154180
```
c1c9cdab

Sink the collection of return instructions until after *all* · 49da9339

Chandler Carruth authored Apr 06, 2012

simplification has been performed. This is a bit less efficient
(requires another ilist walk of the basic blocks) but shouldn't matter
in practice. More importantly, it's just too much work to keep track of
all the various ways the return instructions can be mutated while
simplifying them. This fixes yet another crasher, reported by Daniel
Dunbar.

llvm-svn: 154179

49da9339

Make GVN's propagateEquality non-recursive. No intended functionality change. · d12b18f8
Duncan Sands authored Apr 06, 2012
```
The modifications are a lot more trivial than they appear to be in the diff!

llvm-svn: 154174
```
d12b18f8
Fix narrowing conversion. · 3cacabfb
Benjamin Kramer authored Apr 06, 2012
```
llvm-svn: 154171
```
3cacabfb

Allow 256-bit shuffles to be split if a 128-bit lane contains elements from a... · 447417c9

Craig Topper authored Apr 06, 2012

Allow 256-bit shuffles to be split if a 128-bit lane contains elements from a single source. This is a rewrite of the 256-bit shuffle splitting code based on similar code from legalize types. Fixes PR12413.

llvm-svn: 154166

447417c9

Sink the return instruction collection until after we're done deleting · e41f6f41

Chandler Carruth authored Apr 06, 2012

dead code, including dead return instructions in some cases. Otherwise,
we end up having a bogus poniter to a return instruction that blows up
much further down the road.

It turns out that this pattern is both simpler to code, easier to update
in the face of enhancements to the inliner cleanup, and likely cheaper
given that it won't add dead instructions to the list.

Thanks to John Regehr's numerous test cases for teasing this out.

llvm-svn: 154157

e41f6f41

Deduplicate ARM call-related instructions. · 6a2e99a4

Jakob Stoklund Olesen authored Apr 06, 2012

We had special instructions for iOS because r9 is call-clobbered, but
that is represented dynamically by the register mask operands now, so
there is no need for the pseudo-instructions.

llvm-svn: 154144

6a2e99a4

ARM: Don't form a t2LDRi8 or t2STRi8 with an offset of zero. · d6a1a1dc

Jim Grosbach authored Apr 05, 2012

The load/store optimizer splits LDRD/STRD into two instructions when the
register pairing doesn't work out. For negative offsets in Thumb2, it uses
t2STRi8 to do that. That's fine, except for the case when the offset is in
the range [-4,-1]. In that case, we'll also form a second t2STRi8 with
the original offset plus 4, resulting in a t2STRi8 with a non-negative
offset, which ends up as if it were an STRT, which is completely bogus.
Similarly for loads.

No testcase, unfortunately, as any I've been able to construct is both large
and extremely fragile.

rdar://11193937

llvm-svn: 154141

d6a1a1dc

Apr 05, 2012

ARM assembly aliases for add negative immediates using sub. · 930f2f66

Jim Grosbach authored Apr 05, 2012

'add r2, #-1024' should just use 'sub r2, #1024' rather than erroring out.
Thumb1 aliases for adding a negative immediate to the stack pointer,
also.

rdar://11192734

llvm-svn: 154123

930f2f66

Patch to set is_stmt a little better for prologue lines in a function. · aec8a826

Eric Christopher authored Apr 05, 2012

This enables debuggers to see what are interesting lines for a
breakpoint rather than any line that starts a function.

rdar://9852092

llvm-svn: 154120

aec8a826

Don't break the IV update in TLI::SimplifySetCC(). · 37492eac

Jakob Stoklund Olesen authored Apr 05, 2012

LSR always tries to make the ICmp in the loop latch use the incremented
induction variable. This allows the induction variable to be kept in a
single register.

When the induction variable limit is equal to the stride,
SimplifySetCC() would break LSR's hard work by transforming:

   (icmp (add iv, stride), stride) --> (cmp iv, 0)

This forced us to use lea for the IC update, preventing the simpler
incl+cmp.

<rdar://problem/7643606>
<rdar://problem/11184260>

llvm-svn: 154119

37492eac

Fix accidentally inverted logic from r152803, and make the · cc64bbca
Dan Gohman authored Apr 05, 2012
```
testcase slightly less trivial. This fixes rdar://11171718.

llvm-svn: 154118
```
cc64bbca
Treat f16 the same as f80/f128 for the purposes of generating constants during... · a6eebf60
Owen Anderson authored Apr 05, 2012
```
Treat f16 the same as f80/f128 for the purposes of generating constants during instruction selection.

llvm-svn: 154113
```
a6eebf60

Added support for unpredictable ADC/SBC instructions on ARM, and also fixed... · af3c79f0

Silviu Baranga authored Apr 05, 2012

Added support for unpredictable ADC/SBC instructions on ARM, and also fixed some corner cases involving the PC register as an operand for these instructions.

llvm-svn: 154101

af3c79f0

Added support for handling unpredictable arithmetic instructions on ARM. · d365397d
Silviu Baranga authored Apr 05, 2012
```
llvm-svn: 154100
```
d365397d