Commits · 90c7a108ca6872cf025d4b6485c788d78c2a8504 · Roger Ferrer / llvm-epi-0.8

Nov 16, 2011

Stabilize the output of the dwarf accelerator tables. Fixes a comparison · 0abbd0ef
Eric Christopher authored Nov 15, 2011
```
failure during bootstrap with it turned on.

llvm-svn: 144731
```
0abbd0ef

GEPs with all zero indices are trivially coalesced by fast-isel. For example, · 291ce47d

Chad Rosier authored Nov 15, 2011

%arrayidx135 = getelementptr inbounds [4 x [4 x [4 x [4 x i32]]]]* %M0, i32 0, i64 0
%arrayidx136 = getelementptr inbounds [4 x [4 x [4 x i32]]]* %arrayidx135, i32 0, i64 %idxprom134

Prior to this commit, the GEP instruction that defines %arrayidx136 thought that 
%arrayidx135 was a trivial kill.  The GEP that defines %arrayidx135 doesn't 
generate any code and thus %M0 gets folded into the second GEP.  Thus, we need
to look through GEPs with all zero indices.
rdar://10443319

llvm-svn: 144730

291ce47d

Nov 15, 2011

Added custom lowering for load->dec->store sequence in x86 when the EFLAGS registers is used · 7c7ba1ba
Pete Cooper authored Nov 15, 2011
```
by later instructions.

Only done for DEC64m right now.

Fixes <rdar://problem/6172640>

llvm-svn: 144705
```
7c7ba1ba
Insert modified DBG_VALUE into LiveDbgValueMap. · 43bde96a
Devang Patel authored Nov 15, 2011
```
llvm-svn: 144696
```
43bde96a

We currently use a callback to handle an IL pass deleting a BB that still · f11e7f13

Rafael Espindola authored Nov 15, 2011

has a reference to it. Unfortunately, that doesn't work for codegen passes
since we don't get notified of MBB's being deleted (the original BB stays).

Use that fact to our advantage and after printing a function, check if
any of the IL BBs corresponds to a symbol that was not printed. This fixes
pr11202.

llvm-svn: 144674

f11e7f13

Remove all remaining uses of Value::getNameStr(). · 1f97a5a6
Benjamin Kramer authored Nov 15, 2011
```
llvm-svn: 144648
```
1f97a5a6
Twinify GraphWriter a little bit. · 4c93d15f
Benjamin Kramer authored Nov 15, 2011
```
llvm-svn: 144647
```
4c93d15f
Check all overlaps when looking for used registers. · e14ef7e6
Jakob Stoklund Olesen authored Nov 15, 2011
```
A function using any RC alias is enough to enable the ExeDepsFix pass.

llvm-svn: 144636
```
e14ef7e6
Make use of MachinePointerInfo::getFixedStack. · ab9ebd35
Jay Foad authored Nov 15, 2011
```
llvm-svn: 144635
```
ab9ebd35
Remove some unnecessary includes of PseudoSourceValue.h. · 70679df6
Jay Foad authored Nov 15, 2011
```
llvm-svn: 144634
```
70679df6
Set SeenStore to true to prevent loads from being moved; also eliminates a... · 7098c4e5
Evan Cheng authored Nov 15, 2011
```
Set SeenStore to true to prevent loads from being moved; also eliminates a non-deterministic behavior.

llvm-svn: 144628
```
7098c4e5

Rather than trying to use the loop block sequence *or* the function · 9b548a7f

Chandler Carruth authored Nov 15, 2011

block sequence when recovering from unanalyzable control flow
constructs, *always* use the function sequence. I'm not sure why I ever
went down the path of trying to use the loop sequence, it is
fundamentally not the correct sequence to use. We're trying to preserve
the incoming layout in the cases of unreasonable control flow, and that
is only encoded at the function level. We already have a filter to
select *exactly* the sub-set of blocks within the function that we're
trying to form into a chain.

The resulting code layout is also significantly better because of this.
In several places we were ending up with completely unreasonable control
flow constructs due to the ordering chosen by the loop structure for its
internal storage. This change removes a completely wasteful vector of
basic blocks, saving memory allocation in the common case even though it
costs us CPU in the fairly rare case of unnatural loops. Finally, it
fixes the latest crasher reduced out of GCC's single source. Thanks
again to Benjamin Kramer for the reduction, my bugpoint skills failed at
it.

llvm-svn: 144627

9b548a7f

Break false dependencies before partial register updates. · f8ad336b

Jakob Stoklund Olesen authored Nov 15, 2011

Two new TargetInstrInfo hooks lets the target tell ExecutionDepsFix
about instructions with partial register updates causing false unwanted
dependencies.

The ExecutionDepsFix pass will break the false dependencies if the
updated register was written in the previoius N instructions.

The small loop added to sse-domains.ll runs twice as fast with
dependency-breaking instructions inserted.

llvm-svn: 144602

f8ad336b

Track register ages more accurately. · 543bef6e

Jakob Stoklund Olesen authored Nov 15, 2011

Keep track of the last instruction to define each register individually
instead of per DomainValue.  This lets us track more accurately when a
register was last written.

Also track register ages across basic blocks.  When entering a new
basic block, use the least stale predecessor def as a worst case
estimate for register age.

The register age is used to arbitrate between conflicting domains. The
most recently defined register wins.

llvm-svn: 144601

543bef6e

Nov 14, 2011

Avoid dereferencing off the beginning of lists. · f2fc508d
Evan Cheng authored Nov 14, 2011
```
llvm-svn: 144569
```
f2fc508d

At -O0, multiple uses of a virtual registers in the same BB are being marked · 28ffb7e4

Evan Cheng authored Nov 14, 2011

"kill". This looks like a bug upstream. Since that's going to take some time
to understand, loosen the assertion and disable the optimization when
multiple kills are seen.

llvm-svn: 144568

28ffb7e4

Teach two-address pass to re-schedule two-address instructions (or the kill · 30f44ad7

Evan Cheng authored Nov 14, 2011

instructions of the two-address operands) in order to avoid inserting copies.
This fixes the few regressions introduced when the two-address hack was
disabled (without regressing the improvements).
rdar://10422688

llvm-svn: 144559

30f44ad7

Fix early-clobber handling in shrinkToUses. · 7e6004a3
Jakob Stoklund Olesen authored Nov 14, 2011
```
I broke this in r144515, it affected most ARM testers.

<rdar://problem/10441389>

llvm-svn: 144547
```
7e6004a3

It helps to deallocate memory as well as allocate it. =] This actually · fd9b4d98

Chandler Carruth authored Nov 14, 2011

cleans up all the chains allocated during the processing of each
function so that for very large inputs we don't just grow memory usage
without bound.

llvm-svn: 144533

fd9b4d98

Remove an over-eager assert that was firing on one of the ARM regression · 0a31d149

Chandler Carruth authored Nov 14, 2011

tests when I forcibly enabled block placement.

It is apparantly possible for an unanalyzable block to fallthrough to
a non-loop block. I don't actually beleive this is correct, I believe
that 'canFallThrough' is returning true needlessly for the code
construct, and I've left a bit of a FIXME on the verification code to
try to track down why this is coming up.

Anyways, removing the assert doesn't degrade the correctness of the algorithm.

llvm-svn: 144532

0a31d149

Begin chipping away at one of the biggest quadratic-ish behaviors in · 0af6a0bb

Chandler Carruth authored Nov 14, 2011

this pass. We're leaving already merged blocks on the worklist, and
scanning them again and again only to determine each time through that
indeed they aren't viable. We can instead remove them once we're going
to have to scan the worklist. This is the easy way to implement removing
them. If this remains on the profile (as I somewhat suspect it will), we
can get a lot more clever here, as the worklist's order is essentially
irrelevant. We can use swapping and fold the two loops to reduce
overhead even when there are many blocks on the worklist but only a few
of them are removed.

llvm-svn: 144531

0af6a0bb

Under the hood, MBPI is doing a linear scan of every successor every · 84cd44c7

Chandler Carruth authored Nov 14, 2011

time it is queried to compute the probability of a single successor.
This makes computing the probability of every successor of a block in
sequence... really really slow. ;] This switches to a linear walk of the
successors rather than a quadratic one. One of several quadratic
behaviors slowing this pass down.

I'm not really thrilled with moving the sum code into the public
interface of MBPI, but I don't (at the moment) have ideas for a better
interface. My direction I'm thinking in for a better interface is to
have MBPI actually retain much more state and make *all* of these
queries cheap. That's a lot of work, and would require invasive changes.
Until then, this seems like the least bad (ie, least quadratic)
solution. Suggestions welcome.

llvm-svn: 144530

84cd44c7

Reuse the logic in getEdgeProbability within getHotSucc in order to · a9e71faa

Chandler Carruth authored Nov 14, 2011

correctly handle blocks whose successor weights sum to more than
UINT32_MAX. This is slightly less efficient, but the entire thing is
already linear on the number of successors. Calling it within any hot
routine is a mistake, and indeed no one is calling it. It also
simplifies the code.

llvm-svn: 144527

a9e71faa

Fix an overflow bug in MachineBranchProbabilityInfo. This pass relied on · ed5aa547

Chandler Carruth authored Nov 14, 2011

the sum of the edge weights not overflowing uint32, and crashed when
they did. This is generally safe as BranchProbabilityInfo tries to
provide this guarantee. However, the CFG can get modified during codegen
in a way that grows the *sum* of the edge weights. This doesn't seem
unreasonable (imagine just adding more blocks all with the default
weight of 16), but it is hard to come up with a case that actually
triggers 32-bit overflow. Fortuately, the single-source GCC build is
good at this. The solution isn't very pretty, but its no worse than the
previous code. We're already summing all of the edge weights on each
query, we can sum them, check for an overflow, compute a scale, and sum
them again.

I've included a *greatly* reduced test case out of the GCC source that
triggers it. It's a pretty lame test, as it clearly is just barely
triggering the overflow. I'd like to have something that is much more
definitive, but I don't understand the fundamental pattern that triggers
an explosion in the edge weight sums.

The buggy code is duplicated within this file. I'll colapse them into
a single implementation in a subsequent commit.

llvm-svn: 144526

ed5aa547

Use getVNInfoBefore() when it makes sense. · d7bcf43d
Jakob Stoklund Olesen authored Nov 14, 2011
```
llvm-svn: 144517
```
d7bcf43d

Teach machine block placement to cope with unnatural loops. These don't · 1071cfa4

Chandler Carruth authored Nov 14, 2011

get loop info structures associated with them, and so we need some way
to make forward progress selecting and placing basic blocks. The
technique used here is pretty brutal -- it just scans the list of blocks
looking for the first unplaced candidate. It keeps placing blocks like
this until the CFG becomes tractable.

The cost is somewhat unfortunate, it requires allocating a vector of all
basic block pointers eagerly. I have some ideas about how to simplify
and optimize this, but I'm trying to get the logic correct first.

Thanks to Benjamin Kramer for the reduced test case out of GCC. Sadly
there are other bugs that GCC is tickling that I'm reducing and working
on now.

llvm-svn: 144516

1071cfa4

Use kill slots instead of the previous slot in shrinkToUses. · 69797902
Jakob Stoklund Olesen authored Nov 13, 2011
```
It's more natural to use the actual end points.

llvm-svn: 144515
```
69797902

Nov 13, 2011

Cleanup some 80-columns violations and poor formatting. These snuck by · c4a2cb34
Chandler Carruth authored Nov 13, 2011
```
when I was reading through the code for style.

llvm-svn: 144513
```
c4a2cb34

Terminate all dead defs at the dead slot instead of the 'next' slot. · d8f2405e

Jakob Stoklund Olesen authored Nov 13, 2011

This makes no difference for normal defs, but early clobber dead defs
now look like:

  [Slot_EarlyClobber; Slot_Dead)

instead of:

  [Slot_EarlyClobber; Slot_Register).

Live ranges for normal dead defs look like:

  [Slot_Register; Slot_Dead)

as before.

llvm-svn: 144512

d8f2405e

Simplify early clobber slots a bit. · ce7cc08f
Jakob Stoklund Olesen authored Nov 13, 2011
```
llvm-svn: 144507
```
ce7cc08f

Enhance the assertion mechanisms in place to make it easier to catch · 8e1d9067

Chandler Carruth authored Nov 13, 2011

when we fail to place all the blocks of a loop. Currently this is
happening for unnatural loops, and this logic helps more immediately
point to the problem.

llvm-svn: 144504

8e1d9067

Rename SlotIndexes to match how they are used. · 90b5e565

Jakob Stoklund Olesen authored Nov 13, 2011

The old naming scheme (load/use/def/store) can be traced back to an old
linear scan article, but the names don't match how slots are actually
used.

The load and store slots are not needed after the deferred spill code
insertion framework was deleted.

The use and def slots don't make any sense because we are using
half-open intervals as is customary in C code, but the names suggest
closed intervals.  In reality, these slots were used to distinguish
early-clobber defs from normal defs.

The new naming scheme also has 4 slots, but the names match how the
slots are really used.  This is a purely mechanical renaming, but some
of the code makes a lot more sense now.

llvm-svn: 144503

90b5e565

Teach MBP to force-merge layout successors for blocks with unanalyzable · 0bb42c0f

Chandler Carruth authored Nov 13, 2011

branches that also may involve fallthrough. In the case of blocks with
no fallthrough, we can still re-order the blocks profitably. For example
instruction decoding will in some cases continue past an indirect jump,
making laying out its most likely successor there profitable.

Note, no test case. I don't know how to write a test case that exercises
this logic, but it matches the described desired semantics in
discussions with Jakob and others. If anyone has a nice example of IR
that will trigger this, that would be lovely.

Also note, there are still assertion failures in real world code with
this. I'm digging into those next, now that I know this isn't the cause.

llvm-svn: 144499

0bb42c0f

Hoist another gross nested loop into a helper method. · f9213fe7
Chandler Carruth authored Nov 13, 2011
```
llvm-svn: 144498
```
f9213fe7
Add a missing doxygen comment for a helper method. · eb4ec3ae
Chandler Carruth authored Nov 13, 2011
```
llvm-svn: 144497
```
eb4ec3ae
Hoist a nested loop into its own method. · b336172f
Chandler Carruth authored Nov 13, 2011
```
llvm-svn: 144496
```
b336172f

Rewrite #3 of machine block placement. This is based somewhat on the · 8d150789

Chandler Carruth authored Nov 13, 2011

second algorithm, but only loosely. It is more heavily based on the last
discussion I had with Andy. It continues to walk from the inner-most
loop outward, but there is a key difference. With this algorithm we
ensure that as we visit each loop, the entire loop is merged into
a single chain. At the end, the entire function is treated as a "loop",
and merged into a single chain. This chain forms the desired sequence of
blocks within the function. Switching to a single algorithm removes my
biggest problem with the previous approaches -- they had different
behavior depending on which system triggered the layout. Now there is
exactly one algorithm and one basis for the decision making.

The other key difference is how the chain is formed. This is based
heavily on the idea Andy mentioned of keeping a worklist of blocks that
are viable layout successors based on the CFG. Having this set allows us
to consistently select the best layout successor for each block. It is
expensive though.

The code here remains very rough. There is a lot that needs to be done
to clean up the code, and to make the runtime cost of this pass much
lower. Very much WIP, but this was a giant chunk of code and I'd rather
folks see it sooner than later. Everything remains behind a flag of
course.

I've added a couple of tests to exercise the issues that this iteration
was motivated by: loop structure preservation. I've also fixed one test
that was exhibiting the broken behavior of the previous version.

llvm-svn: 144495

8d150789

Prune more RALinScan. RALinScan was also here! · 4784df71
NAKAMURA Takumi authored Nov 13, 2011
```
llvm-svn: 144487
```
4784df71
More dead code elimination in VirtRegMap. · c601d8c7
Jakob Stoklund Olesen authored Nov 13, 2011
```
This thing is looking a lot like a virtual register map now.

llvm-svn: 144486
```
c601d8c7
Stop tracking spill slot uses in VirtRegMap. · 28df7ef8
Jakob Stoklund Olesen authored Nov 13, 2011
```
Nobody cared, StackSlotColoring scans the instructions to find used stack
slots.

llvm-svn: 144485
```
28df7ef8