Commits · 68062617a6e4fd5e2fefa0ebb99a36e77dcb0cff · Roger Ferrer / llvm-epi-0.8

Apr 10, 2012

Make a somewhat subtle change in the logic of block placement. Sometimes · 68062617

Chandler Carruth authored Apr 10, 2012

the loop header has a non-loop predecessor which has been pre-fused into
its chain due to unanalyzable branches. In this case, rotating the
header into the body of the loop in order to place a loop exit at the
bottom of the loop is a Very Bad Idea as it makes the loop
non-contiguous.

I'm working on a good test case for this, but it's a bit annoynig to
craft. I should get one shortly, but I'm submitting this now so I can
begin the (lengthy) performance analysis process. An initial run of LNT
looks really, really good, but there is too much noise there for me to
trust it much.

llvm-svn: 154395

68062617

Transform div to mul with reciprocal only when fp imm is legal. · 4d1220de
Anton Korobeynikov authored Apr 10, 2012
```
This fixes PR12516 and uncovers one weird problem in legalize (workarounded)

llvm-svn: 154394
```
4d1220de
Use the correct section types on Solaris for unwind data on both x86 and x86-64. · bbec8720
David Chisnall authored Apr 10, 2012
```
Patch by Dmitri Shubin!

llvm-svn: 154391
```
bbec8720
Express the number of ULPs in fpaccuracy metadata as a real rather than a · af06b26c
Duncan Sands authored Apr 10, 2012
```
rational number, eg as 2.5 rather than 5, 2.  OK'd by Peter Collingbourne.

llvm-svn: 154387
```
af06b26c

Fix 12513: Loop unrolling breaks with indirect branches. · 4442bfe5

Andrew Trick authored Apr 10, 2012

Take this opportunity to generalize the indirectbr bailout logic for
loop transformations. CFG transformations will never get indirectbr
right, and there's no point trying.

llvm-svn: 154386

4442bfe5

whitespace · 4104ed9c
Andrew Trick authored Apr 10, 2012
```
llvm-svn: 154385
```
4104ed9c
Make the code slightly more palatable. · 136861d9
Evan Cheng authored Apr 10, 2012
```
llvm-svn: 154378
```
136861d9
Add a constructor for DataRefImpl and remove excess initialization. · 549515e1
Danil Malyshev authored Apr 10, 2012
```
llvm-svn: 154371
```
549515e1

Fix a long standing tail call optimization bug. When a libcall is emitted · f8bad080

Evan Cheng authored Apr 10, 2012

legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.

PR12419
rdar://9770785
rdar://11195178

llvm-svn: 154370

f8bad080

Don't try to zExt just to check if an integer constant is zero, it might · 1d9672bd
Rafael Espindola authored Apr 10, 2012
```
not fit in a i64.

llvm-svn: 154364
```
1d9672bd
ARM LDR/LDRT has the same encoding collision as STR/STRT. · 8f99bc3a
Jim Grosbach authored Apr 10, 2012
```
Generalized logic of r154141.

llvm-svn: 154362
```
8f99bc3a

Apr 09, 2012

Have TargetLowering::getPICJumpTableRelocBase return a node that points to the · 8483a6c4
Akira Hatanaka authored Apr 09, 2012
```
GOT if jump table uses 64-bit gp-relative relocation.

llvm-svn: 154341
```
8483a6c4

When performing a truncating store, it's possible to rearrange the data · e0e38f61

Chad Rosier authored Apr 09, 2012

in-register, such that we can use a single vector store rather then a 
series of scalar stores.

For func_4_8 the generated code

	vldr	d16, LCPI0_0
	vmov	d17, r0, r1
	vadd.i16	d16, d17, d16
	vmov.u16	r0, d16[3]
	strb	r0, [r2, #3]
	vmov.u16	r0, d16[2]
	strb	r0, [r2, #2]
	vmov.u16	r0, d16[1]
	strb	r0, [r2, #1]
	vmov.u16	r0, d16[0]
	strb	r0, [r2]
	bx	lr

becomes

	vldr	d16, LCPI0_0
	vmov	d17, r0, r1
	vadd.i16	d16, d17, d16
	vuzp.8	d16, d17
	vst1.32	{d16[0]}, [r2, :32]
	bx	lr

I'm not fond of how this combine pessimizes 2012-03-13-DAGCombineBug.ll,
but I couldn't think of a way to judiciously apply this combine.

This

	ldrh	r0, [r0, #4]
	strh	r0, [r1]

becomes

	vldr	d16, [r0]
	vmov.u16	r0, d16[2]
	vmov.32	d16[0], r0
	vuzp.16	d16, d17
	vst1.32	{d16[0]}, [r1, :32]

PR11158
rdar://10703339

llvm-svn: 154340

e0e38f61

Patch r153892 for PR11861 apparently broke an external project (see PR12493). · 3ad11ff9

Lang Hames authored Apr 09, 2012

This patch restores TwoAddressInstructionPass's pre-r153892 behaviour when
rescheduling instructions in TryInstructionTransform. Hopefully this will fix
PR12493. To refix PR11861, lowering of INSERT_SUBREGS is deferred until after
the copy that unties the operands is emitted (this seems to be a more
appropriate fix for that issue anyway).

llvm-svn: 154338

3ad11ff9

Update comments and remove unnecessary isVolatile() check. · 99cbde9e
Chad Rosier authored Apr 09, 2012
```
llvm-svn: 154336
```
99cbde9e

Fix accidentally constant conditions found by uncommitted improvements to -Wconstant-conversion. · e6b6fae8

David Blaikie authored Apr 09, 2012

A couple of cases where we were accidentally creating constant conditions by
something like "x == a || b" instead of "x == a || x == b". In one case a
conditional & then unreachable was used - I transformed this into a direct
assert instead.

llvm-svn: 154324

e6b6fae8

Pattern match a setcc of boolean value with 0 as a truncate. · 8f62b324
Rafael Espindola authored Apr 09, 2012
```
llvm-svn: 154322
```
8f62b324
This patch adds X86 instruction itineraries, which were missed by the · 2eec3672
Preston Gurd authored Apr 09, 2012
```
original patch to add itineraries, to X86InstrArithmetc.td.  

llvm-svn: 154320
```
2eec3672
Lower some x86 shuffle sequences to the vblend family of instructions. · fb7e2ae5
Nadav Rotem authored Apr 09, 2012
```
llvm-svn: 154313
```
fb7e2ae5
Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type. · b801ca39
Nadav Rotem authored Apr 09, 2012
```
Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering.

llvm-svn: 154310
```
b801ca39
Remove unnecessary type check when combining and/or/xor of swizzles. Move some... · 9c3da316
Craig Topper authored Apr 09, 2012
```
Remove unnecessary type check when combining and/or/xor of swizzles. Move some checks to allow better early out.

llvm-svn: 154309
```
9c3da316
Remove unnecessary 'else' on an 'if' that always returns · e5893f64
Craig Topper authored Apr 09, 2012
```
llvm-svn: 154308
```
e5893f64
Optimize code slightly. No functionality change. · e3ad4834
Craig Topper authored Apr 09, 2012
```
llvm-svn: 154307
```
e3ad4834
Replace some explicit checks with asserts for conditions that should never happen. · 5894fe43
Craig Topper authored Apr 09, 2012
```
llvm-svn: 154305
```
5894fe43

Cleanup and relax a restriction on the matching of global offsets into · 3779ac10

Chandler Carruth authored Apr 09, 2012

x86 addressing modes. This allows PIE-based TLS offsets to fit directly
into an addressing mode immediate offset, which is the last remaining
code quality issue from PR12380. With this patch, that PR is completely
fixed.

To understand why this patch is correct to match these offsets into
addressing mode immediates, break it down by cases:
1) 32-bit is trivially correct, and unmodified here.
2) 64-bit non-small mode is unchanged and never matches.
3) 64-bit small PIC code which is RIP-relative is handled specially in
the match to try to fit RIP into the base register. If it fails, it
now early exits. This behavior is unchanged by the patch.
4) 64-bit small non-PIC code which is not RIP-relative continues to work
as it did before. The reason these immediates are safe is because the
ABI ensures they fit in small mode. This behavior is unchanged.
5) 64-bit small PIC code which is *not* using RIP-relative addressing.
This is the only case changed by the patch, and the primary place you
see it is in TLS, either the win64 section offset TLS or Linux
local-exec TLS model in a PIC compilation. Here the ABI again ensures
that the immediates fit because we are in small mode, and any other
operations required due to the PIC relocation model have been handled
externally to the Wrapper node (extra loads etc are made around the
wrapper node in ISelLowering).

I've tested this as much as I can comparing it with GCC's output, and
everything appears safe. I discussed this with Anton and it made sense
to him at least at face value. That said, if there are issues with PIC
code after this patch, yell and we can revert it.

llvm-svn: 154304

3779ac10

Optimize code a bit. No functional change intended. · 6148fe65
Craig Topper authored Apr 08, 2012
```
llvm-svn: 154299
```
6148fe65

Apr 08, 2012

Silence sign-compare warning. · bb6ff087
Benjamin Kramer authored Apr 08, 2012
```
llvm-svn: 154297
```
bb6ff087

Only have codegen turn fdiv by a constant into fmul by the reciprocal · 2f1dc381

Duncan Sands authored Apr 08, 2012

when -ffast-math, i.e. don't just always do it if the reciprocal can
be formed exactly.  There is already an IR level transform that does
that, and it does it more carefully.

llvm-svn: 154296

2f1dc381

Simplify code that tries to do vector extracts for shuffles when the mask... · c8e2d91a

Craig Topper authored Apr 08, 2012

Simplify code that tries to do vector extracts for shuffles when the mask width and the input vector widths don't match. No need to check the min and max are in range before calculating the start index. The range check after having the start index is sufficient. Also no need to check for an extract from the beginning differently.

llvm-svn: 154295

c8e2d91a

Teach LLVM about a PIE option which, when enabled on top of PIC, makes · ede4a8aa

Chandler Carruth authored Apr 08, 2012

optimizations which are valid for position independent code being linked
into a single executable, but not for such code being linked into
a shared library.

I discussed the design of this with Eric Christopher, and the decision
was to support an optional bit rather than a completely separate
relocation model. Fundamentally, this is still PIC relocation, its just
that certain optimizations are only valid under a PIC relocation model
when the resulting code won't be in a shared library. The simplest path
to here is to expose a single bit option in the TargetOptions. If folks
have different/better designs, I'm all ears. =]

I've included the first optimization based upon this: changing TLS
models to the *Exec models when PIE is enabled. This is the LLVM
component of PR12380 and is all of the hard work.

llvm-svn: 154294

ede4a8aa

Move the TLSModel information into the TargetMachine rather than hiding · 16f0ebcb

Chandler Carruth authored Apr 08, 2012

in TargetLowering. There was already a FIXME about this location being
odd. The interface is simplified as a consequence. This will also make
it easier to change TLS models when compiling with PIE.

llvm-svn: 154292

16f0ebcb

EngineBuilder::create is expected to take ownership of the TargetMachine... · 25a3d816

Benjamin Kramer authored Apr 08, 2012

EngineBuilder::create is expected to take ownership of the TargetMachine passed to it. Delete it on error or when we create an interpreter that doesn't need it.

llvm-svn: 154288

25a3d816

Remove an over zealous assert. The assert was trying to catch places · bed1abf9

Chandler Carruth authored Apr 08, 2012

where a chain outside of the loop block-set ended up in the worklist for
scheduling as part of the contiguous loop. However, asserting the first
block in the chain is in the loop-set isn't a valid check -- we may be
forced to drag a chain into the worklist due to one block in the chain
being part of the loop even though the first block is *not* in the loop.
This occurs when we have been forced to form a chain early due to
un-analyzable branches.

No test case here as I have no idea how to even begin reducing one, and
it will be hopelessly fragile. We have to somehow end up with a loop
header of an inner loop which is a successor of a basic block with an
unanalyzable pair of branch instructions. Ow. Self-host triggers it so
it is unlikely it will regress.

This at least gets block placement back to passing selfhost and the test
suite. There are still a lot of slowdown that I don't like coming out of
block placement, although there are now also a lot of speedups. =[ I'm
seeing swings in both directions up to 10%. I'm going to try to find
time to dig into this and see if we can turn this on for 3.1 as it does
a really good job of cleaning up after some loops that degraded with the
inliner changes.

llvm-svn: 154287

bed1abf9

Add a debug-only 'dump' method to the BlockChain structure to ease · 49158908
Chandler Carruth authored Apr 08, 2012
```
debugging.

llvm-svn: 154286
```
49158908

Teach InstCombine to nuke a common alloca pattern -- an alloca which has · f82b0e2d

Chandler Carruth authored Apr 08, 2012

GEPs, bit casts, and stores reaching it but no other instructions. These
often show up during the iterative processing of the inliner, SROA, and
DCE. Once we hit this point, we can completely remove the alloca. These
were actually showing up in the final, fully optimized code in a bunch
of inliner tests I've been working on, and notably they show up after
LLVM finishes optimizing away all function calls involved in
hash_combine(a, b).

llvm-svn: 154285

f82b0e2d

AVX2: Build splat vectors by broadcasting a scalar from the constant pool. · 82609df6

Nadav Rotem authored Apr 08, 2012

Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.

llvm-svn: 154284

82609df6

Remove the 'Parent' pointer from the MDNodeOperand class. · 5c0068f8

Bill Wendling authored Apr 08, 2012

An MDNode has a list of MDNodeOperands allocated directly after it as part of
its allocation. Therefore, the Parent of the MDNodeOperands can be found by
walking back through the operands to the beginning of that list. Mark the first
operand's value pointer as being the 'first' operand so that we know where the
beginning of said list is.

This saves a *lot* of space during LTO with -O0 -g flags.

llvm-svn: 154280

5c0068f8

Allow subclasses of the ValueHandleBase to store information as part of the · 9b2503a0
Bill Wendling authored Apr 08, 2012
```
value pointer by making the value pointer into a pointer-int pair with 2 bits
available for flags.

llvm-svn: 154279
```
9b2503a0

Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and... · d024cef2

Craig Topper authored Apr 07, 2012

Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove patterns for selecting the intrinsic. Similar was already done for avx1.

llvm-svn: 154272

d024cef2

Apr 07, 2012

Move vinsertf128 patterns near the instruction definitions. Add... · aa9aab5a

Craig Topper authored Apr 07, 2012

Move vinsertf128 patterns near the instruction definitions. Add AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns.

llvm-svn: 154268

aa9aab5a