Commits · 84cd44c750b0c4c0827206f97fe7591b00f0b1c3 · Roger Ferrer / llvm-epi-0.8

Nov 14, 2011

Under the hood, MBPI is doing a linear scan of every successor every · 84cd44c7

Chandler Carruth authored Nov 14, 2011

time it is queried to compute the probability of a single successor.
This makes computing the probability of every successor of a block in
sequence... really really slow. ;] This switches to a linear walk of the
successors rather than a quadratic one. One of several quadratic
behaviors slowing this pass down.

I'm not really thrilled with moving the sum code into the public
interface of MBPI, but I don't (at the moment) have ideas for a better
interface. My direction I'm thinking in for a better interface is to
have MBPI actually retain much more state and make *all* of these
queries cheap. That's a lot of work, and would require invasive changes.
Until then, this seems like the least bad (ie, least quadratic)
solution. Suggestions welcome.

llvm-svn: 144530

84cd44c7

Reuse the logic in getEdgeProbability within getHotSucc in order to · a9e71faa

Chandler Carruth authored Nov 14, 2011

correctly handle blocks whose successor weights sum to more than
UINT32_MAX. This is slightly less efficient, but the entire thing is
already linear on the number of successors. Calling it within any hot
routine is a mistake, and indeed no one is calling it. It also
simplifies the code.

llvm-svn: 144527

a9e71faa

Fix an overflow bug in MachineBranchProbabilityInfo. This pass relied on · ed5aa547

Chandler Carruth authored Nov 14, 2011

the sum of the edge weights not overflowing uint32, and crashed when
they did. This is generally safe as BranchProbabilityInfo tries to
provide this guarantee. However, the CFG can get modified during codegen
in a way that grows the *sum* of the edge weights. This doesn't seem
unreasonable (imagine just adding more blocks all with the default
weight of 16), but it is hard to come up with a case that actually
triggers 32-bit overflow. Fortuately, the single-source GCC build is
good at this. The solution isn't very pretty, but its no worse than the
previous code. We're already summing all of the edge weights on each
query, we can sum them, check for an overflow, compute a scale, and sum
them again.

I've included a *greatly* reduced test case out of the GCC source that
triggers it. It's a pretty lame test, as it clearly is just barely
triggering the overflow. I'd like to have something that is much more
definitive, but I don't understand the fundamental pattern that triggers
an explosion in the edge weight sums.

The buggy code is duplicated within this file. I'll colapse them into
a single implementation in a subsequent commit.

llvm-svn: 144526

ed5aa547

Use getVNInfoBefore() when it makes sense. · d7bcf43d
Jakob Stoklund Olesen authored Nov 14, 2011
```
llvm-svn: 144517
```
d7bcf43d

Teach machine block placement to cope with unnatural loops. These don't · 1071cfa4

Chandler Carruth authored Nov 14, 2011

get loop info structures associated with them, and so we need some way
to make forward progress selecting and placing basic blocks. The
technique used here is pretty brutal -- it just scans the list of blocks
looking for the first unplaced candidate. It keeps placing blocks like
this until the CFG becomes tractable.

The cost is somewhat unfortunate, it requires allocating a vector of all
basic block pointers eagerly. I have some ideas about how to simplify
and optimize this, but I'm trying to get the logic correct first.

Thanks to Benjamin Kramer for the reduced test case out of GCC. Sadly
there are other bugs that GCC is tickling that I'm reducing and working
on now.

llvm-svn: 144516

1071cfa4

Use kill slots instead of the previous slot in shrinkToUses. · 69797902
Jakob Stoklund Olesen authored Nov 13, 2011
```
It's more natural to use the actual end points.

llvm-svn: 144515
```
69797902

Nov 13, 2011

Cleanup some 80-columns violations and poor formatting. These snuck by · c4a2cb34
Chandler Carruth authored Nov 13, 2011
```
when I was reading through the code for style.

llvm-svn: 144513
```
c4a2cb34

Terminate all dead defs at the dead slot instead of the 'next' slot. · d8f2405e

Jakob Stoklund Olesen authored Nov 13, 2011

This makes no difference for normal defs, but early clobber dead defs
now look like:

  [Slot_EarlyClobber; Slot_Dead)

instead of:

  [Slot_EarlyClobber; Slot_Register).

Live ranges for normal dead defs look like:

  [Slot_Register; Slot_Dead)

as before.

llvm-svn: 144512

d8f2405e

Simplify early clobber slots a bit. · ce7cc08f
Jakob Stoklund Olesen authored Nov 13, 2011
```
llvm-svn: 144507
```
ce7cc08f

Enhance the assertion mechanisms in place to make it easier to catch · 8e1d9067

Chandler Carruth authored Nov 13, 2011

when we fail to place all the blocks of a loop. Currently this is
happening for unnatural loops, and this logic helps more immediately
point to the problem.

llvm-svn: 144504

8e1d9067

Rename SlotIndexes to match how they are used. · 90b5e565

Jakob Stoklund Olesen authored Nov 13, 2011

The old naming scheme (load/use/def/store) can be traced back to an old
linear scan article, but the names don't match how slots are actually
used.

The load and store slots are not needed after the deferred spill code
insertion framework was deleted.

The use and def slots don't make any sense because we are using
half-open intervals as is customary in C code, but the names suggest
closed intervals.  In reality, these slots were used to distinguish
early-clobber defs from normal defs.

The new naming scheme also has 4 slots, but the names match how the
slots are really used.  This is a purely mechanical renaming, but some
of the code makes a lot more sense now.

llvm-svn: 144503

90b5e565

Teach MBP to force-merge layout successors for blocks with unanalyzable · 0bb42c0f

Chandler Carruth authored Nov 13, 2011

branches that also may involve fallthrough. In the case of blocks with
no fallthrough, we can still re-order the blocks profitably. For example
instruction decoding will in some cases continue past an indirect jump,
making laying out its most likely successor there profitable.

Note, no test case. I don't know how to write a test case that exercises
this logic, but it matches the described desired semantics in
discussions with Jakob and others. If anyone has a nice example of IR
that will trigger this, that would be lovely.

Also note, there are still assertion failures in real world code with
this. I'm digging into those next, now that I know this isn't the cause.

llvm-svn: 144499

0bb42c0f

Hoist another gross nested loop into a helper method. · f9213fe7
Chandler Carruth authored Nov 13, 2011
```
llvm-svn: 144498
```
f9213fe7
Add a missing doxygen comment for a helper method. · eb4ec3ae
Chandler Carruth authored Nov 13, 2011
```
llvm-svn: 144497
```
eb4ec3ae
Hoist a nested loop into its own method. · b336172f
Chandler Carruth authored Nov 13, 2011
```
llvm-svn: 144496
```
b336172f

Rewrite #3 of machine block placement. This is based somewhat on the · 8d150789

Chandler Carruth authored Nov 13, 2011

second algorithm, but only loosely. It is more heavily based on the last
discussion I had with Andy. It continues to walk from the inner-most
loop outward, but there is a key difference. With this algorithm we
ensure that as we visit each loop, the entire loop is merged into
a single chain. At the end, the entire function is treated as a "loop",
and merged into a single chain. This chain forms the desired sequence of
blocks within the function. Switching to a single algorithm removes my
biggest problem with the previous approaches -- they had different
behavior depending on which system triggered the layout. Now there is
exactly one algorithm and one basis for the decision making.

The other key difference is how the chain is formed. This is based
heavily on the idea Andy mentioned of keeping a worklist of blocks that
are viable layout successors based on the CFG. Having this set allows us
to consistently select the best layout successor for each block. It is
expensive though.

The code here remains very rough. There is a lot that needs to be done
to clean up the code, and to make the runtime cost of this pass much
lower. Very much WIP, but this was a giant chunk of code and I'd rather
folks see it sooner than later. Everything remains behind a flag of
course.

I've added a couple of tests to exercise the issues that this iteration
was motivated by: loop structure preservation. I've also fixed one test
that was exhibiting the broken behavior of the previous version.

llvm-svn: 144495

8d150789

Prune more RALinScan. RALinScan was also here! · 4784df71
NAKAMURA Takumi authored Nov 13, 2011
```
llvm-svn: 144487
```
4784df71
More dead code elimination in VirtRegMap. · c601d8c7
Jakob Stoklund Olesen authored Nov 13, 2011
```
This thing is looking a lot like a virtual register map now.

llvm-svn: 144486
```
c601d8c7
Stop tracking spill slot uses in VirtRegMap. · 28df7ef8
Jakob Stoklund Olesen authored Nov 13, 2011
```
Nobody cared, StackSlotColoring scans the instructions to find used stack
slots.

llvm-svn: 144485
```
28df7ef8

Remove dead code and data from VirtRegMap. · 92255f27

Jakob Stoklund Olesen authored Nov 13, 2011

Most of this stuff was supporting the old deferred spill code insertion
mechanism.  Modern spillers just edit machine code in place.

llvm-svn: 144484

92255f27

Stop tracking unused registers in VirtRegMap. · 38b3f312
Jakob Stoklund Olesen authored Nov 13, 2011
```
The information was only used by the register allocator in
StackSlotColoring.

llvm-svn: 144482
```
38b3f312

Remove the -color-ss-with-regs option. · 6ddb767f

Jakob Stoklund Olesen authored Nov 13, 2011

It was off by default.

The new register allocators don't have the problems that made it
necessary to reallocate registers during stack slot coloring.

llvm-svn: 144481

6ddb767f

Delete VirtRegRewriter. · 5343da64
Jakob Stoklund Olesen authored Nov 13, 2011
```
And there was much rejoicing.

llvm-svn: 144480
```
5343da64
Switch PBQP to VRM's trivial rewriter. · 03f73ab7
Jakob Stoklund Olesen authored Nov 13, 2011
```
The very complicated VirtRegRewriter is going away.

llvm-svn: 144479
```
03f73ab7
Delete the old spilling framework from LiveIntervalAnalysis. · f61a6fe2
Jakob Stoklund Olesen authored Nov 12, 2011
```
This is dead code, all register allocators use InlineSpiller.

llvm-svn: 144478
```
f61a6fe2
Delete the 'standard' spiller with used the old spilling framework. · 7ef502f6
Jakob Stoklund Olesen authored Nov 12, 2011
```
The current register allocators all use the inline spiller.

llvm-svn: 144477
```
7ef502f6

Switch PBQP to the modern InlineSpiller framework. · 11bb63a7

Jakob Stoklund Olesen authored Nov 12, 2011

It is worth noting that the old spiller would split live ranges around
basic blocks. The new spiller doesn't do that.

PBQP should do its own live range splitting with
SplitEditor::splitSingleBlock() if desired.  See
RAGreedy::tryBlockSplit().

llvm-svn: 144476

11bb63a7

Nov 12, 2011

Delete the linear scan register allocator. · e7e50e6f

Jakob Stoklund Olesen authored Nov 12, 2011

RegAllocGreedy has been the default for six months now.

Deleting RegAllocLinearScan makes it possible to also delete
VirtRegRewriter and clean up the spiller code.

llvm-svn: 144475

e7e50e6f

The dwarf standard says that the only differences between a out-of-line · e7cc8bff

Rafael Espindola authored Nov 12, 2011

instance and a concrete inlined instance are the use of DW_TAG_subprogram
instead of DW_TAG_inlined_subroutine and the who owns the tree.

We were also omitting DW_AT_inline from the abstract roots. To fix this,
make sure we mark abstract instance roots with DW_AT_inline even when
we have only out-of-line instances referring to them with DW_AT_abstract_origin.

FileCheck is not a very good tool for tests like this, maybe we should add
a -verify mode to llvm-dwarfdump.

llvm-svn: 144441

e7cc8bff

Don't try to form pre/post-indexed loads/stores until after LegalizeDAG runs. Fixes PR11029. · 9d448e4a
Eli Friedman authored Nov 12, 2011
```
llvm-svn: 144438
```
9d448e4a

Some cleanup and bulletproofing for node replacement in LegalizeDAG. To... · 13477156

Eli Friedman authored Nov 11, 2011

Some cleanup and bulletproofing for node replacement in LegalizeDAG. To maintain LegalizeDAG invariants, whenever we a node is replaced, we must attempt to delete it, and if it still
has uses after it is replaced (which can happen in rare cases due to CSE), we must revisit it.

llvm-svn: 144432

13477156

Nov 11, 2011
- Add a custom safepoint method, in order for language implementers to decide... · 26c328d7
  Nicolas Geoffray authored Nov 11, 2011
```
Add a custom safepoint method, in order for language implementers to decide which machine instruction gets to be a safepoint.

llvm-svn: 144399
```
  26c328d7
- Initialize variable. · 0a917b7a
  Eric Christopher authored Nov 11, 2011
```
llvm-svn: 144360
```
  0a917b7a
- If we have a DIE with an AT_specification use that instead of the normal · c12c211c
  Eric Christopher authored Nov 11, 2011
```
addr DIE when adding to the dwarf accelerator tables.

llvm-svn: 144354
```
  c12c211c
Nov 10, 2011

Check in getOrCreateSubprogramDIE if a declaration exists and if so output · 79278365
Rafael Espindola authored Nov 10, 2011
```
it first.

This is a more general fix to pr11300.

llvm-svn: 144324
```
79278365
Make types and namespaces take multiple DIEs for the accelerator tables · 66b37db6
Eric Christopher authored Nov 10, 2011
```
as well.

llvm-svn: 144319
```
66b37db6
Move type handling to make sure we get all created types that aren't · e288793e
Eric Christopher authored Nov 10, 2011
```
forward decls and have names into the dwarf accelerator types table.

llvm-svn: 144306
```
e288793e
Rework adding function names to the dwarf accelerator tables, allow · d9843b34
Eric Christopher authored Nov 10, 2011
```
multiple dies per function and support C++ basenames.

llvm-svn: 144304
```
d9843b34

Use a bigger hammer to fix PR11314 by disabling the "forcing two-address · d33b2d6b

Evan Cheng authored Nov 10, 2011

instruction lower optimization" in the pre-RA scheduler.

The optimization, rather the hack, was done before MI use-list was available.
Now we should be able to implement it in a better way, perhaps in the
two-address pass until a MI scheduler is available.

Now that the scheduler has to backtrack to handle call sequences. Adding
artificial scheduling constraints is just not safe. Furthermore, the hack
is not taking all the other scheduling decisions into consideration so it's just
as likely to pessimize code. So I view disabling this optimization goodness
regardless of PR11314.

llvm-svn: 144267

d33b2d6b

Strip old implicit operands after foldMemoryOperand. · eef48b69

Jakob Stoklund Olesen authored Nov 10, 2011

The TII.foldMemoryOperand hook preserves implicit operands from the
original instruction.  This is not what we want when those implicit
operands refer to the register being spilled.

Implicit operands referring to other registers are preserved.

This fixes PR11347.

llvm-svn: 144247

eef48b69