Commits · cca9aa58ca1594c3c74ee6c2055fbe23a6febdc7 · Roger Ferrer / llvm-epi-0.8

Nov 16, 2011

Record landing pads with a SmallSetVector to avoid multiple entries. · cca9aa58

Bob Wilson authored Nov 16, 2011

There may be many invokes that share one landing pad, and the previous code
would record the landing pad once for each invoke.  Besides the wasted
effort, a pair of volatile loads gets inserted every time the landing pad is
processed.  The rest of the code can get optimized away when a landing pad
is processed repeatedly, but the volatile loads remain, resulting in code like:

LBB35_18:
Ltmp483:
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r4, [r7, #-72]
        ldr     r2, [r7, #-68]

llvm-svn: 144787

cca9aa58

Update the SP in the SjLj jmpbuf whenever it changes. <rdar://problem/10444602 > · 643e63c4

Bob Wilson authored Nov 16, 2011

This same basic code was in the older version of the SjLj exception handling,
but it was removed in the recent revisions to that code.  It needs to be there.

llvm-svn: 144782

643e63c4

Revert r144568 now that r144730 has fixed the fast-isel kill marker bug. · 4ac36c8e
Evan Cheng authored Nov 16, 2011
```
llvm-svn: 144776
```
4ac36c8e

If the 2addr instruction has other kills, don't move it below any other uses... · b8c55a53

Evan Cheng authored Nov 16, 2011

If the 2addr instruction has other kills, don't move it below any other uses since we don't want to extend other live ranges.

llvm-svn: 144772

b8c55a53

RescheduleKillAboveMI() must backtrack to before the rescheduled DBG_VALUE... · 59f8156e
Evan Cheng authored Nov 16, 2011
```
RescheduleKillAboveMI() must backtrack to before the rescheduled DBG_VALUE instructions. rdar://10451185

llvm-svn: 144771
```
59f8156e
Process all uses first before defs to accurately capture register liveness. rdar://10449480 · 9ddd69a8
Evan Cheng authored Nov 16, 2011
```
llvm-svn: 144770
```
9ddd69a8
CONCAT_VECTORS can have more than two operands. PR11389. · 87f92512
Eli Friedman authored Nov 16, 2011
```
llvm-svn: 144768
```
87f92512

Add a couple asserts so it will be easier to debug if we accidentally pass... · d257a464

Eli Friedman authored Nov 16, 2011

Add a couple asserts so it will be easier to debug if we accidentally pass indexed loads/stores to the legalizer.

llvm-svn: 144767

d257a464

Rename MVT::untyped to MVT::Untyped to match similar nomenclature. · ca2f78a9
Owen Anderson authored Nov 16, 2011
```
llvm-svn: 144747
```
ca2f78a9
Stabilize the output of the dwarf accelerator tables. Fixes a comparison · 0abbd0ef
Eric Christopher authored Nov 15, 2011
```
failure during bootstrap with it turned on.

llvm-svn: 144731
```
0abbd0ef

GEPs with all zero indices are trivially coalesced by fast-isel. For example, · 291ce47d

Chad Rosier authored Nov 15, 2011

%arrayidx135 = getelementptr inbounds [4 x [4 x [4 x [4 x i32]]]]* %M0, i32 0, i64 0
%arrayidx136 = getelementptr inbounds [4 x [4 x [4 x i32]]]* %arrayidx135, i32 0, i64 %idxprom134

Prior to this commit, the GEP instruction that defines %arrayidx136 thought that 
%arrayidx135 was a trivial kill.  The GEP that defines %arrayidx135 doesn't 
generate any code and thus %M0 gets folded into the second GEP.  Thus, we need
to look through GEPs with all zero indices.
rdar://10443319

llvm-svn: 144730

291ce47d

Nov 15, 2011

Added custom lowering for load->dec->store sequence in x86 when the EFLAGS registers is used · 7c7ba1ba
Pete Cooper authored Nov 15, 2011
```
by later instructions.

Only done for DEC64m right now.

Fixes <rdar://problem/6172640>

llvm-svn: 144705
```
7c7ba1ba
Insert modified DBG_VALUE into LiveDbgValueMap. · 43bde96a
Devang Patel authored Nov 15, 2011
```
llvm-svn: 144696
```
43bde96a

We currently use a callback to handle an IL pass deleting a BB that still · f11e7f13

Rafael Espindola authored Nov 15, 2011

has a reference to it. Unfortunately, that doesn't work for codegen passes
since we don't get notified of MBB's being deleted (the original BB stays).

Use that fact to our advantage and after printing a function, check if
any of the IL BBs corresponds to a symbol that was not printed. This fixes
pr11202.

llvm-svn: 144674

f11e7f13

Remove all remaining uses of Value::getNameStr(). · 1f97a5a6
Benjamin Kramer authored Nov 15, 2011
```
llvm-svn: 144648
```
1f97a5a6
Twinify GraphWriter a little bit. · 4c93d15f
Benjamin Kramer authored Nov 15, 2011
```
llvm-svn: 144647
```
4c93d15f
Check all overlaps when looking for used registers. · e14ef7e6
Jakob Stoklund Olesen authored Nov 15, 2011
```
A function using any RC alias is enough to enable the ExeDepsFix pass.

llvm-svn: 144636
```
e14ef7e6
Make use of MachinePointerInfo::getFixedStack. · ab9ebd35
Jay Foad authored Nov 15, 2011
```
llvm-svn: 144635
```
ab9ebd35
Remove some unnecessary includes of PseudoSourceValue.h. · 70679df6
Jay Foad authored Nov 15, 2011
```
llvm-svn: 144634
```
70679df6
Set SeenStore to true to prevent loads from being moved; also eliminates a... · 7098c4e5
Evan Cheng authored Nov 15, 2011
```
Set SeenStore to true to prevent loads from being moved; also eliminates a non-deterministic behavior.

llvm-svn: 144628
```
7098c4e5

Rather than trying to use the loop block sequence *or* the function · 9b548a7f

Chandler Carruth authored Nov 15, 2011

block sequence when recovering from unanalyzable control flow
constructs, *always* use the function sequence. I'm not sure why I ever
went down the path of trying to use the loop sequence, it is
fundamentally not the correct sequence to use. We're trying to preserve
the incoming layout in the cases of unreasonable control flow, and that
is only encoded at the function level. We already have a filter to
select *exactly* the sub-set of blocks within the function that we're
trying to form into a chain.

The resulting code layout is also significantly better because of this.
In several places we were ending up with completely unreasonable control
flow constructs due to the ordering chosen by the loop structure for its
internal storage. This change removes a completely wasteful vector of
basic blocks, saving memory allocation in the common case even though it
costs us CPU in the fairly rare case of unnatural loops. Finally, it
fixes the latest crasher reduced out of GCC's single source. Thanks
again to Benjamin Kramer for the reduction, my bugpoint skills failed at
it.

llvm-svn: 144627

9b548a7f

Break false dependencies before partial register updates. · f8ad336b

Jakob Stoklund Olesen authored Nov 15, 2011

Two new TargetInstrInfo hooks lets the target tell ExecutionDepsFix
about instructions with partial register updates causing false unwanted
dependencies.

The ExecutionDepsFix pass will break the false dependencies if the
updated register was written in the previoius N instructions.

The small loop added to sse-domains.ll runs twice as fast with
dependency-breaking instructions inserted.

llvm-svn: 144602

f8ad336b

Track register ages more accurately. · 543bef6e

Jakob Stoklund Olesen authored Nov 15, 2011

Keep track of the last instruction to define each register individually
instead of per DomainValue.  This lets us track more accurately when a
register was last written.

Also track register ages across basic blocks.  When entering a new
basic block, use the least stale predecessor def as a worst case
estimate for register age.

The register age is used to arbitrate between conflicting domains. The
most recently defined register wins.

llvm-svn: 144601

543bef6e

Nov 14, 2011

Avoid dereferencing off the beginning of lists. · f2fc508d
Evan Cheng authored Nov 14, 2011
```
llvm-svn: 144569
```
f2fc508d

At -O0, multiple uses of a virtual registers in the same BB are being marked · 28ffb7e4

Evan Cheng authored Nov 14, 2011

"kill". This looks like a bug upstream. Since that's going to take some time
to understand, loosen the assertion and disable the optimization when
multiple kills are seen.

llvm-svn: 144568

28ffb7e4

Teach two-address pass to re-schedule two-address instructions (or the kill · 30f44ad7

Evan Cheng authored Nov 14, 2011

instructions of the two-address operands) in order to avoid inserting copies.
This fixes the few regressions introduced when the two-address hack was
disabled (without regressing the improvements).
rdar://10422688

llvm-svn: 144559

30f44ad7

Fix early-clobber handling in shrinkToUses. · 7e6004a3
Jakob Stoklund Olesen authored Nov 14, 2011
```
I broke this in r144515, it affected most ARM testers.

<rdar://problem/10441389>

llvm-svn: 144547
```
7e6004a3

It helps to deallocate memory as well as allocate it. =] This actually · fd9b4d98

Chandler Carruth authored Nov 14, 2011

cleans up all the chains allocated during the processing of each
function so that for very large inputs we don't just grow memory usage
without bound.

llvm-svn: 144533

fd9b4d98

Remove an over-eager assert that was firing on one of the ARM regression · 0a31d149

Chandler Carruth authored Nov 14, 2011

tests when I forcibly enabled block placement.

It is apparantly possible for an unanalyzable block to fallthrough to
a non-loop block. I don't actually beleive this is correct, I believe
that 'canFallThrough' is returning true needlessly for the code
construct, and I've left a bit of a FIXME on the verification code to
try to track down why this is coming up.

Anyways, removing the assert doesn't degrade the correctness of the algorithm.

llvm-svn: 144532

0a31d149

Begin chipping away at one of the biggest quadratic-ish behaviors in · 0af6a0bb

Chandler Carruth authored Nov 14, 2011

this pass. We're leaving already merged blocks on the worklist, and
scanning them again and again only to determine each time through that
indeed they aren't viable. We can instead remove them once we're going
to have to scan the worklist. This is the easy way to implement removing
them. If this remains on the profile (as I somewhat suspect it will), we
can get a lot more clever here, as the worklist's order is essentially
irrelevant. We can use swapping and fold the two loops to reduce
overhead even when there are many blocks on the worklist but only a few
of them are removed.

llvm-svn: 144531

0af6a0bb

Under the hood, MBPI is doing a linear scan of every successor every · 84cd44c7

Chandler Carruth authored Nov 14, 2011

time it is queried to compute the probability of a single successor.
This makes computing the probability of every successor of a block in
sequence... really really slow. ;] This switches to a linear walk of the
successors rather than a quadratic one. One of several quadratic
behaviors slowing this pass down.

I'm not really thrilled with moving the sum code into the public
interface of MBPI, but I don't (at the moment) have ideas for a better
interface. My direction I'm thinking in for a better interface is to
have MBPI actually retain much more state and make *all* of these
queries cheap. That's a lot of work, and would require invasive changes.
Until then, this seems like the least bad (ie, least quadratic)
solution. Suggestions welcome.

llvm-svn: 144530

84cd44c7

Reuse the logic in getEdgeProbability within getHotSucc in order to · a9e71faa

Chandler Carruth authored Nov 14, 2011

correctly handle blocks whose successor weights sum to more than
UINT32_MAX. This is slightly less efficient, but the entire thing is
already linear on the number of successors. Calling it within any hot
routine is a mistake, and indeed no one is calling it. It also
simplifies the code.

llvm-svn: 144527

a9e71faa

Fix an overflow bug in MachineBranchProbabilityInfo. This pass relied on · ed5aa547

Chandler Carruth authored Nov 14, 2011

the sum of the edge weights not overflowing uint32, and crashed when
they did. This is generally safe as BranchProbabilityInfo tries to
provide this guarantee. However, the CFG can get modified during codegen
in a way that grows the *sum* of the edge weights. This doesn't seem
unreasonable (imagine just adding more blocks all with the default
weight of 16), but it is hard to come up with a case that actually
triggers 32-bit overflow. Fortuately, the single-source GCC build is
good at this. The solution isn't very pretty, but its no worse than the
previous code. We're already summing all of the edge weights on each
query, we can sum them, check for an overflow, compute a scale, and sum
them again.

I've included a *greatly* reduced test case out of the GCC source that
triggers it. It's a pretty lame test, as it clearly is just barely
triggering the overflow. I'd like to have something that is much more
definitive, but I don't understand the fundamental pattern that triggers
an explosion in the edge weight sums.

The buggy code is duplicated within this file. I'll colapse them into
a single implementation in a subsequent commit.

llvm-svn: 144526

ed5aa547

Use getVNInfoBefore() when it makes sense. · d7bcf43d
Jakob Stoklund Olesen authored Nov 14, 2011
```
llvm-svn: 144517
```
d7bcf43d

Teach machine block placement to cope with unnatural loops. These don't · 1071cfa4

Chandler Carruth authored Nov 14, 2011

get loop info structures associated with them, and so we need some way
to make forward progress selecting and placing basic blocks. The
technique used here is pretty brutal -- it just scans the list of blocks
looking for the first unplaced candidate. It keeps placing blocks like
this until the CFG becomes tractable.

The cost is somewhat unfortunate, it requires allocating a vector of all
basic block pointers eagerly. I have some ideas about how to simplify
and optimize this, but I'm trying to get the logic correct first.

Thanks to Benjamin Kramer for the reduced test case out of GCC. Sadly
there are other bugs that GCC is tickling that I'm reducing and working
on now.

llvm-svn: 144516

1071cfa4

Use kill slots instead of the previous slot in shrinkToUses. · 69797902
Jakob Stoklund Olesen authored Nov 13, 2011
```
It's more natural to use the actual end points.

llvm-svn: 144515
```
69797902

Nov 13, 2011

Cleanup some 80-columns violations and poor formatting. These snuck by · c4a2cb34
Chandler Carruth authored Nov 13, 2011
```
when I was reading through the code for style.

llvm-svn: 144513
```
c4a2cb34

Terminate all dead defs at the dead slot instead of the 'next' slot. · d8f2405e

Jakob Stoklund Olesen authored Nov 13, 2011

This makes no difference for normal defs, but early clobber dead defs
now look like:

  [Slot_EarlyClobber; Slot_Dead)

instead of:

  [Slot_EarlyClobber; Slot_Register).

Live ranges for normal dead defs look like:

  [Slot_Register; Slot_Dead)

as before.

llvm-svn: 144512

d8f2405e

Simplify early clobber slots a bit. · ce7cc08f
Jakob Stoklund Olesen authored Nov 13, 2011
```
llvm-svn: 144507
```
ce7cc08f

Enhance the assertion mechanisms in place to make it easier to catch · 8e1d9067

Chandler Carruth authored Nov 13, 2011

when we fail to place all the blocks of a loop. Currently this is
happening for unnatural loops, and this logic helps more immediately
point to the problem.

llvm-svn: 144504

8e1d9067