Commits · 4f147a54a1b9c809764bdcd3f3c907a9ca503b20 · Roger Ferrer / llvm-epi

May 20, 2016

[RegBankSelect] Get rid of a now dead method: setSafeInsertPoint. · 4f147a54
Quentin Colombet authored May 20, 2016
```
This is now encapsulated in the RepairingPlacement class.

llvm-svn: 270247
```
4f147a54

[RegBankSelect] Take advantage of a potential best cost information in · 6e80dbcd

Quentin Colombet authored May 20, 2016

computeMapping.

Computing the cost of a mapping takes some time.
Since in Fast mode, the cost is irrelevant, just spare some cycles by not
computing it.
In Greedy mode, we need to choose the best cost, that means that when
the local cost gets more expensive than the best cost, we can stop
computing the repairing and cost for the current mapping.

llvm-svn: 270245

6e80dbcd

[RegBankSelect] Use frequency and probability information to compute · 25fcef73

Quentin Colombet authored May 20, 2016

more precise cost in Greedy mode.

In Fast mode the cost is irrelevant so do not bother requiring that
those passes get scheduled.

llvm-svn: 270244

25fcef73

[RegBankSelect] Use the Fast mode for functions with the optnone attribute. · a5530128
Quentin Colombet authored May 20, 2016
```
llvm-svn: 270242
```
a5530128
[RegBankSelect] Specify different optimization mode for the pass. · 46df722e
Quentin Colombet authored May 20, 2016
```
The mode should be choose by the target when instantiating the pass.

llvm-svn: 270235
```
46df722e

Fix error reporting in register scavenger (lack of emergency spill slot) · 64439ac7

Krzysztof Parzyszek authored May 20, 2016

- Do not store Twine objects.
- Remove report_fatal_error, since llvm_unreachable does terminate the
  program in release mode.

llvm-svn: 270233

64439ac7

[RegBankSelect] Add a method to avoid splitting while repairing. · f75c2bfc

Quentin Colombet authored May 20, 2016

The previous choice of the insertion points for repairing was
straightfoward but may introduce some basic block or edge splitting. In
some situation this is something we can avoid.
For instance, when repairing a phi argument, instead of placing the
repairing on the related incoming edge, we may move it to the previous
block, before the terminators. This is only possible when the argument
is not defined by one of the terminator.

llvm-svn: 270232

f75c2bfc

Correction to r270219: fix detection of invalid frame index · ce6f3bde
Krzysztof Parzyszek authored May 20, 2016
```
llvm-svn: 270220
```
ce6f3bde
Skip entries with invalid indexes in the search loop in register scavenger · 70b1eee7
Krzysztof Parzyszek authored May 20, 2016
```
llvm-svn: 270219
```
70b1eee7
Fix some comment typos in SelectionDAGBuilder. NFC · 86f1f4ca
Diana Picus authored May 20, 2016
```
llvm-svn: 270190
```
86f1f4ca

[RegBankSelect] Refactor the code to split the repairing and mapping of · d84d00ba

Quentin Colombet authored May 20, 2016

an instruction.

Use the previously introduced RepairingPlacement class to split the code
computing the repairing placement from the code doing the actual
placement. That way, we will be able to consider different placement and
then, only apply the best one.

llvm-svn: 270168

d84d00ba

[RegBankSelect] Add helper class for repairing code placement. · 55650754

Quentin Colombet authored May 20, 2016

When assigning the register banks we may have to insert repairing code
to move already assigned values accross register banks.

Introduce a few helper classes to keep track of what is involved in the
repairing of an operand:
- InsertPoint and its derived classes record the positions, in the CFG,
  where repairing has to be inserted.
- RepairingPlacement holds all the insert points for the repairing of an
  operand plus the kind of action that is required to do the repairing.

This is going to be used to keep track of how the repairing should be
done, while comparing different solutions for an instruction. Indeed, we
will need the repairing placement to capture the cost of a solution and
we do not want to compute it a second time when we do the actual
repairing.

llvm-svn: 270167

55650754

[RegBankSelect] Refactor assignmentMatch to avoid testing the current · 0d77da4e

Quentin Colombet authored May 20, 2016

register bank twice.

Prior to this change, we were checking if the assignment for the current
machine operand was matching, then we would check if the mismatch
requires to insert repair code.
We actually already have this information from the first check, so just
pass it along.

NFCI.

llvm-svn: 270166

0d77da4e

Fix pr27728. · 78d947b4

Rafael Espindola authored May 20, 2016

Sorry for the lack testcase. There is one in the pr, but it depends on
std::sort and the .ll version is 110 lines, so I don't think it is
wort it.

The bug was that we were sorting after adding a terminator, and the
sorting algorithm could end up putting the terminator in the middle of
the List vector.

With that we would create a Spans map entry keyed on nullptr which would
then be added to CUs and fail in that sorting.

llvm-svn: 270165

78d947b4

[RegBankSelect] Introduce MappingCost helper class. · cfd97b93

Quentin Colombet authored May 20, 2016

This helper class will be used to represent the cost of mapping an
instruction to a specific register bank.
The particularity of these costs is that they are mostly local, thus the
frequency of the basic block is irrelevant. However, for few
instructions (e.g., phis and terminators), the cost may be non-local and
then, we need to account for the frequency of the involved basic blocks.

This will be used by the greedy mode I am working on.

llvm-svn: 270163

cfd97b93

clang-format. NFC. · 0a78f8c4
Rafael Espindola authored May 19, 2016
```
llvm-svn: 270156
```
0a78f8c4

Reapply r263460: [SpillPlacement] Fix a quadratic behavior in spill placement. · b926bdac

Quentin Colombet authored May 19, 2016

Using Chandler's words from r265331:
This commit was greatly exacerbating PR17409 and effectively regressed
build time for lot of (very large) code when compiled with ASan or MSan.

PR17409 is fixed by r269249, so this is fine to reapply r263460.

Original commit message:
The bad behavior happens when we have a function with a long linear
chain of basic blocks, and have a live range spanning most of this
chain, but with very few uses.

Let say we have only 2 uses.

The Hopfield network is only seeded with two active blocks where the
uses are, and each iteration of the outer loop in
`RAGreedy::growRegion()` only adds two new nodes to the network due to
the completely linear shape of the CFG.  Meanwhile,
`SpillPlacer->iterate()` visits the whole set of discovered nodes, which
adds up to a quadratic algorithm.

This is an historical accident effect from r129188.

When the Hopfield network is expanding, most of the action is happening
on the frontier where new nodes are being added. The internal nodes in
the network are not likely to be flip-flopping much, or they will at
least settle down very quickly. This means that while
`SpillPlacer->iterate()` is recomputing all the nodes in the network, it
is probably only the two frontier nodes that are changing their output.

Instead of recomputing the whole network on each iteration, we can
maintain a SparseSet of nodes that need to be updated:

- `SpillPlacement::activate()` adds the node to the todo list.
- When a node changes value (i.e., `update()` returns true), its
  neighbors are added to the todo list.
- `SpillPlacement::iterate()` only updates the nodes in the list.

The result of Hopfield iterations is not necessarily exact. It should
converge to a local minimum, but there is no guarantee that it will find
a global minimum. It is possible that updating nodes in a different
order will cause us to switch to a different local minimum. In other
words, this is not NFC, but although I saw a few runtime improvements
and regressions when I benchmarked this change, those were side effects
and actually the performance change is in the noise as expected.

Huge thanks to Jakob Stoklund Olesen <stoklund@2pi.dk> for his
feedbacks, guidance and time for the review.

llvm-svn: 270149

b926bdac

May 19, 2016

[ARM, AArch64] Match additional patterns to ldN instructions · 476c0afc

Matthew Simpson authored May 19, 2016

When matching an interleaved load to an ldN pattern, the interleaved access
pass checks that all users of the load are shuffles. If the load is used by an
instruction other than a shuffle, the pass gives up and an ldN is not
generated. This patch considers users of the load that are extractelement
instructions. It attempts to modify the extracts to use one of the available
shuffles rather than the load. After the transformation, the load is only used
by shuffles and will then be matched with an ldN pattern.

Differential Revision: http://reviews.llvm.org/D20250

llvm-svn: 270142

476c0afc

Modify emitTypeInformation to use MemoryTypeTableBuilder · a972d612

Adrian McCarthy authored May 19, 2016

A baby step toward translating DIType records to CodeView.

This does not (yet) combine the record length with the record data. I'm going back and forth trying to determine if that's a good idea.

llvm-svn: 270106

a972d612

[ARM, AArch64] Properly initialize InterleavedAccessPass · 330a1255

Matthew Simpson authored May 19, 2016

InterleavedAccessPass is an IR-level pass, so this change will enable testing
it with opt. This is part of D20250.

llvm-svn: 270101

330a1255

CodeGen: Move check of EnablePostRAScheduler to avoid disabling antidependency breaker · 64535014

Mitch Bodart authored May 19, 2016

Previously, specifying -post-RA-scheduler=true had the side effect of
disabling the antidependency breaker, yielding different behavior than
if the post-RA-scheduler was enabled via the scheduling model.

Differential Revision: http://reviews.llvm.org/D20186

llvm-svn: 270077

64535014

[SelectionDAG] rename/move isKnownToBeAPowerOfTwo() from TargetLowering (NFC) · f39f42d3

Sanjay Patel authored May 19, 2016

There are at least 2 places (DAGCombiner, X86ISelLowering) where this could be used instead
of ad-hoc and watered down code that is trying to match a power-of-2 pattern.

Differential Revision: http://reviews.llvm.org/D20439

llvm-svn: 270073

f39f42d3

CodeGen: Make the global-merge pass independently testable, and add a test. · fe12d0e3
Peter Collingbourne authored May 19, 2016
```
llvm-svn: 270023
```
fe12d0e3
reduce indentation; NFCI · b2bcd95a
Sanjay Patel authored May 19, 2016
```
llvm-svn: 270007
```
b2bcd95a

[MBP] Remove a redundant skipFunction(). NFC. · c01919e7

Haicheng Wu authored May 18, 2016

skipFunction() is called twice.

Differential Revision: http://reviews.llvm.org/D20377

llvm-svn: 269994

c01919e7

May 18, 2016

When looking for a spill slot in reg scavenger, find one that matches RC · 14a1c184

Krzysztof Parzyszek authored May 18, 2016

When looking for an available spill slot, the register scavenger would stop
after finding the first one with no register assigned to it. That slot may
have size and alignment that do not meet the requirements of the register
that is to be spilled. Instead, find an available slot that is the closest
in size and alignment to one that is needed to spill a register from RC.

Differential Revision: http://reviews.llvm.org/D20295

llvm-svn: 269969

14a1c184

Re-commit r269828 "X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions" · 8eb336c1

Hans Wennborg authored May 18, 2016

with an additional fix to make RegAllocFast ignore undef physreg uses. It would
previously get confused about the "push %eax" instruction's use of eax. That
method for adjusting the stack pointer is used in X86FrameLowering::emitSPUpdate
as well, but since that runs after register-allocation, we didn't run into the
RegAllocFast issue before.

llvm-svn: 269949

8eb336c1

[codeview] Some cleanup of Symbol Records. · 63a2846e

Zachary Turner authored May 17, 2016

* Reworks the CVSymbolTypes.def to work similarly to TypeRecords.def.
* Moves some enums from SymbolRecords.h to CodeView.h to maintain
  consistency with how we do type records.
* Generalize a few simple things like the record prefix
* Define the leaf enum and the kind enum similar to how we do with tyep
  records.

Differential Revision: http://reviews.llvm.org/D20342
Reviewed By: amccarth, rnk

llvm-svn: 269867

63a2846e

[DwarfDebug] Make tuning predicates private, should be used only in ctor. · 10177212
Paul Robinson authored May 17, 2016
```
llvm-svn: 269859
```
10177212

May 17, 2016

Debug Info: Introduce a DwarfDebug::UseDWARF2Bitfields flag · 6323ddf9

Adrian Prantl authored May 17, 2016

instead of having DwarfUnit query the debugger tuning options.

Follow-up commmit to r269827.
Thanks to Paul Robinson for pointing this out!

llvm-svn: 269840

6323ddf9

Debug Info: Don't emit bitfields in the DWARF4 format when tuning for GDB. · f0a41089

Adrian Prantl authored May 17, 2016

As discovered in PR27758, GDB does not fully support the DWARF 4 format.
This patch ensures we always emit bitfields in the DWARF 2 when tuning for GDB.

llvm-svn: 269827

f0a41089

Fix an assert in SelectionDAGBuilder when processing inline asm · 38ed8021

Renato Golin authored May 17, 2016

When processing inline asm that contains errors, make sure we can recover
gracefully by creating an UNDEF SDValue for the inline asm statement before
returning from SelectionDAGBuilder::visitInlineAsm. This is necessary for
consumers that don't exit on the first error that is emitted (e.g. clang)
and that would assert later on.

Fixes PR24071.

Patch by Diana Picus.

llvm-svn: 269811

38ed8021

Simplify handling of hidden stub. · 712f957c

Rafael Espindola authored May 17, 2016

Since r207518 they are printed exactly like non-hidden stubs on x86 and
since r207517 on ARM.

This means we can use a single set for all stubs in those platforms.

llvm-svn: 269776

712f957c

Factor PrologEpilogInserter around spilling, frame finalization, and scavenging · 1aaf87e9

Derek Schuff authored May 17, 2016

PrologEpilogInserter has these 3 phases, which are related, but not
all of them are needed by all targets. This patch reorganizes PEI's
varous functions around those phases for more clear separation. It also
introduces a new TargetMachine hook, usesPhysRegsForPEI, which is true
for non-virtual targets. When it is true, all the phases operate as
before, and PEI requires the AllVRegsAllocated property on
MachineFunctions. Otherwise, CSR spilling and scavenging are skipped and
only prolog/epilog insertion/frame finalization is done.

Differential Revision: http://reviews.llvm.org/D18366

llvm-svn: 269750

1aaf87e9

Debug Info: Don't emit a DW_AT_data_member_location for DWARF bitfields. · 7aa34c8c

Adrian Prantl authored May 17, 2016

The DWARF spec states that a member entry may have either a
DW_AT_data_member_location or a DW_AT_data_bit_offset, but not both.

This fixes a bug found in PR 27758.

llvm-svn: 269731

7aa34c8c

Remove .hot and .unlikely prefixes from function section names. · 01d98ba0

Easwaran Raman authored May 16, 2016

This code currently relies on static methods in ProfileSummary to determine whether a function is hot or unlikley. I am refactoring the ProfileSummary code and these methods will be removed. As discussed offline, the right way to re-introduce this is to add a pass to annotate functions with unlikely/hot hints and use the hints to determine the prefix here.

llvm-svn: 269726

01d98ba0

Debug info: Don't emit a DW_AT_byte_size when emitting a DWARF4 bit field. · e7d833de

Adrian Prantl authored May 16, 2016

The DWARF spec clearly states that a bit field member should have either a
DW_AT_byte_size or a DW_AT_bit_size, but not both.
Also the DW_AT_byte_size is redundant with the size of the type of the member.

This fixes a bug found in PR 27758.

llvm-svn: 269714

e7d833de

May 16, 2016

Fail early on unknown appending linkage variables. · e64619ce

Rafael Espindola authored May 16, 2016

In practice only a few well known appending linkage variables work.

Currently if codegen sees an unknown appending linkage variable it will
just print it as a regular global. That is wrong as the symbol in the
produced object file has different semantics as the one provided by the
appending linkage.

This just errors early instead of producing a broken .o.

llvm-svn: 269706

e64619ce

SelectionDAG: Select min/max when both are used · c31a9d06

Matt Arsenault authored May 16, 2016

Allow two users of the condition if the other user
is also a min/max select. i.e.

%c = icmp slt i32 %x, %y
%min = select i1 %c, i32 %x, i32 %y
%max = select i1 %c, i32 %y, i32 %x

llvm-svn: 269699

c31a9d06

Remove extra whitespace. NFC. · 1cb56a18
Chad Rosier authored May 16, 2016
```
llvm-svn: 269685
```
1cb56a18