Commits · a0d9f2582b7c31e604f4dc82fd5eae10d33aae7e · Lorenzo Albano / LLVM bpEVL

Feb 03, 2017

[SelectionDAG] Fix for PR30775: Assertion `NodeToMatch->getOpcode() != · a0d9f258

Alexey Bataev authored Feb 03, 2017

ISD::DELETED_NODE && "NodeToMatch was removed partway through
selection"' failed.

NodeToMatch can be modified during matching, but code does not handle
this situation.

Differential Revision: https://reviews.llvm.org/D29292

llvm-svn: 294003

a0d9f258

DebugInfo: ensure type and namespace names are included in pubnames/pubtypes... · a0e3c751

David Blaikie authored Feb 03, 2017

DebugInfo: ensure type and namespace names are included in pubnames/pubtypes even when they are only present in type units

While looking to add support for placing singular types (types that will
only be emitted in one place (such as attached to a strong vtable or
explicit template instantiation definition)) not in type units (since
type units have overhead) I stumbled across that change causing an
increase in pubtypes.

Turns out we were missing some types from type units if they were only
referenced from other type units and not from the debug_info section.

This fixes that, following GCC's line of describing the offset of such
entities as the CU die (since there's no compile unit-relative offset
that would describe such an entity - they aren't in the CU). Also like
GCC, this change prefers to describe the type stub within the CU rather
than the "just use the CU offset" fallback where possible. This may give
the DWARF consumer some opportunity to find the extra info in the type
stub - though I'm not sure GDB does anything with this currently.

The size of the pubnames/pubtypes sections now match exactly with or
without type units enabled.

This nearly triples (+189%) the pubtypes section for a clang self-host
and grows pubnames by 0.07% (without compression). For a total of 8%
increase in debug info sections of the objects of a Split DWARF build
when using type units.

llvm-svn: 293971

a0e3c751

[lto] add getLinkerOpts() · dd4ebc1d

Bob Haarman authored Feb 02, 2017

Summary: Some compilers, including MSVC and Clang, allow linker options to be specified in source files. In the legacy LTO API, there is a getLinkerOpts() method that returns linker options for the bitcode module being processed. This change adds that method to the new API, so that the COFF linker can get the right linker options when using the new LTO API.

Reviewers: pcc, ruiu, mehdi_amini, tejohnson

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D29207

llvm-svn: 293950

dd4ebc1d

Feb 02, 2017

[CodeGen] Remove dead call-or-prologue enum from CCState · c35139ec

Reid Kleckner authored Feb 02, 2017

This enum has been dead since Olivier Stannard re-implemented ARM byval
handling in r202985 (2014).

llvm-svn: 293943

c35139ec

[PGO] internal option cleanups · 58fcc9bd

Xinliang David Li authored Feb 02, 2017

1. Added comments for options
2. Added missing option cl::desc field
3. Uniified function filter option for graph viewing.
   Now PGO count/raw-counts share the same
   filter option: -view-bfi-func-name=.

llvm-svn: 293938

58fcc9bd

[LiveRangeEdit] Don't mess up with LiveInterval when a new vreg is created. · 5725f56b

Quentin Colombet authored Feb 02, 2017

In r283838, we added the capability of splitting unspillable register.
When doing so we had to make sure the split live-ranges were also
unspillable and we did that by marking the related live-ranges in the
delegate method that is called when a new vreg is created.
However, by accessing the live-range there, we also triggered their lazy
computation (LiveIntervalAnalysis::getInterval) which is not what we
want in general. Indeed, later code in LiveRangeEdit is going to build
the live-ranges this lazy computation may mess up that computation
resulting in assertion failures. Namely, the createEmptyIntervalFrom
method expect that the live-range is going to be empty, not computed.

Thanks to Mikael Holmén <mikael.holmen@ericsson.com> for noticing and
reporting the problem.

llvm-svn: 293934

5725f56b

[PGO] make graph view internal options available for all builds · 1eb4ec6a
Xinliang David Li authored Feb 02, 2017
```
Differential Revision: https://reviews.llvm.org/D29259

llvm-svn: 293921
```
1eb4ec6a
Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." · 93f9d5ce
Nirav Dave authored Feb 02, 2017
```
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.

llvm-svn: 293915
```
93f9d5ce
Use N0 instead of N->getOperand(0) in DagCombiner::visitAdd. NFC · f3e421d6
Amaury Sechet authored Feb 02, 2017
```
llvm-svn: 293903
```
f3e421d6

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. · 4442667f

Nirav Dave authored Feb 02, 2017

    Recommiting after fixing X86 inc/dec chain bug.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293893

4442667f

RegisterCoalescer: Cleanup joinReservedPhysReg(); NFC · 9dc3b5ff

Matthias Braun authored Feb 02, 2017

- Factor out a common subexpression
- Add some helpful comments
- Fix printing of a register in a debug message

llvm-svn: 293856

9dc3b5ff

Remove an assertion that doesn't hold when mixing -g and -gmlt through · 5362216c

Paul Robinson authored Feb 01, 2017

LTO.  Replace it with a related assertion, ensuring that abstract
variables appear only in abstract scopes.
Part of PR31437.

Differential Revision: http://reviews.llvm.org/D29430

llvm-svn: 293841

5362216c

Feb 01, 2017

Change debug-info-for-profiling from a TargetOption to a function attribute. · 0944a8c2

Dehao Chen authored Feb 01, 2017

Summary: LTO requires the debug-info-for-profiling to be a function attribute.

Reviewers: echristo, mehdi_amini, dblaikie, probinson, aprantl

Reviewed By: mehdi_amini, dblaikie, aprantl

Subscribers: aprantl, probinson, ahatanak, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D29203

llvm-svn: 293833

0944a8c2

Remove an assertion that doesn't hold when mixing -g and -gmlt through · a380e613
Paul Robinson authored Feb 01, 2017
```
LTO.  Part of PR31437.

Differential Revision: http://reviews.llvm.org/D29310

llvm-svn: 293818
```
a380e613

[ImplicitNullCheck] Extend canReorder scope · 08da2e28

Sanjoy Das authored Feb 01, 2017

Summary:
This change allows a re-order of two intructions if their uses
are overlapped.

Patch by Serguei Katkov!

Reviewers: reames, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29120

llvm-svn: 293775

08da2e28

[legalizetypes] Push fp16 -> fp32 extension node to worklist. · 7a5ec55f

Florian Hahn authored Feb 01, 2017

Summary:
This way, the type legalization machinery will take care of registering
the result of this node properly.

This patches fixes all failing fp16 test cases  with expensive checks.
(CodeGen/ARM/fp16-promote.ll, CodeGen/ARM/fp16.ll, CodeGen/X86/cvt16.ll
CodeGen/X86/soft-fp.ll) 


Reviewers: t.p.northover, baldrick, olista01, bogner, jmolloy, davidxl, ab, echristo, hfinkel

Reviewed By: hfinkel

Subscribers: mehdi_amini, hfinkel, davide, RKSimon, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28195

llvm-svn: 293765

7a5ec55f

[CodeGen] Move MacroFusion to the target · 94edf029

Evandro Menezes authored Feb 01, 2017

This patch moves the class for scheduling adjacent instructions,
MacroFusion, to the target.

In AArch64, it also expands the fusion to all instructions pairs in a
scheduling block, beyond just among the predecessors of the branch at the
end.

Differential revision: https://reviews.llvm.org/D28489

llvm-svn: 293737

94edf029

[ImplicitNullCheck] NFC isSuitableMemoryOp cleanup · 15e50b51

Sanjoy Das authored Feb 01, 2017

Summary:
isSuitableMemoryOp method is repsonsible for verification
that instruction is a candidate to use in implicit null check.
Additionally it checks that base register is not re-defined before.
In case base has been re-defined it just returns false and lookup
is continued while any suitable instruction will not succeed this check
as well. This results in redundant further operations.

So when we found that base register has been re-defined we just
stop.

Patch by Serguei Katkov!

Reviewers: reames, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29119

llvm-svn: 293736

15e50b51

Fix regalloc assignment of overlapping registers · 70c245e9

Stanislav Mekhanoshin authored Feb 01, 2017

SplitEditor::defFromParent() can create a register copy.
If register is a tuple of other registers and not all lanes are used
a copy will be done on a full tuple regardless. Later register unit
for an unused lane will be considered free and another overlapping
register tuple can be assigned to a different value even though first
register is live at that point. That is because interference only look at
liveness info, while full register copy clobbers all lanes, even unused.

This patch fixes copy to only cover used lanes.

Differential Revision: https://reviews.llvm.org/D29105

llvm-svn: 293728

70c245e9

CodeGen: Allow small copyable blocks to "break" the CFG. · b15c0667

Kyle Butt authored Jan 31, 2017

When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well, subject to some simple frequency calculations.

Differential Revision: https://reviews.llvm.org/D28583

llvm-svn: 293716

b15c0667

Jan 31, 2017

GlobalISel: the translation of an invoke must branch to the good block. · c6bfa481
Tim Northover authored Jan 31, 2017
```
Otherwise bad things happen if the basic block order isn't trivial after an
invoke.

llvm-svn: 293679
```
c6bfa481

InterleaveAccessPass: Avoid constructing invalid shuffle masks · 01fa9622

Matthias Braun authored Jan 31, 2017

Fix a bug where we would construct shufflevector instructions addressing
invalid elements.

Differential Revision: https://reviews.llvm.org/D29313

llvm-svn: 293673

01fa9622

GlobalISel: merge invoke and call translation paths. · 293f7435

Tim Northover authored Jan 31, 2017

Well, sort of. But the lower-level code that invoke used to be using completely
botched the handling of varargs functions, which hopefully won't be possible if
they're using the same code.

llvm-svn: 293670

293f7435

[X86] Implement -mfentry · a7c041d1

Nirav Dave authored Jan 31, 2017

Summary: Insert calls to __fentry__ at function entry.

Reviewers: hfinkel, craig.topper

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D28000

llvm-svn: 293648

a7c041d1

[DAGCombine] require UnsafeFPMath for re-association of addition · 8813d5d2

Nicolai Haehnle authored Jan 31, 2017

Summary:
The affected transforms all implicitly use associativity of addition,
for which we usually require unsafe math to be enabled.

The "Aggressive" flag is only meant to convey information about the
performance of the fused ops relative to a fmul+fadd sequence.

Fixes Bug 31626.

Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD

Subscribers: jholewinski, nemanjai, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D28675

llvm-svn: 293635

8813d5d2

[ExecutionDepsFix] Improve clearance calculation for loops · 578cf7aa

Keno Fischer authored Jan 30, 2017

Summary:
In revision rL278321, ExecutionDepsFix learned how to pick a better
register for undef register reads, e.g. for instructions such as
`vcvtsi2sdq`. While this revision improved performance on a good number
of our benchmarks, it unfortunately also caused significant regressions
(up to 3x) on others. This regression turned out to be caused by loops
such as:

PH -> A -> B (xmm<Undef> -> xmm<Def>) -> C -> D -> EXIT
      ^                                  |
      +----------------------------------+

In the previous version of the clearance calculation, we would visit
the blocks in order, remembering for each whether there were any
incoming backedges from blocks that we hadn't processed yet and if
so queuing up the block to be re-processed. However, for loop structures
such as the above, this is clearly insufficient, since the block B
does not have any unknown backedges, so we do not see the false
dependency from the previous interation's Def of xmm registers in B.

To fix this, we need to consider all blocks that are part of the loop
and reprocess them one the correct clearance values are known. As
an optimization, we also want to avoid reprocessing any later blocks
that are not part of the loop.

In summary, the iteration order is as follows:
Before: PH A B C D A'
Corrected (Naive): PH A B C D A' B' C' D'
Corrected (w/ optimization): PH A B C A' B' C' D

To facilitate this optimization we introduce two new counters for each
basic block. The first counts how many of it's predecssors have
completed primary processing. The second counts how many of its
predecessors have completed all processing (we will call such a block
*done*. Now, the criteria to reprocess a block is as follows:
    - All Predecessors have completed primary processing
    - For x the number of predecessors that have completed primary
      processing *at the time of primary processing of this block*,
      the number of predecessors that are done has reached x.

The intuition behind this criterion is as follows:
We need to perform primary processing on all predecessors in order to
find out any direct defs in those predecessors. When predecessors are
done, we also know that we have information about indirect defs (e.g.
in block B though that were inherited through B->C->A->B). However,
we can't wait for all predecessors to be done, since that would
cause cyclic dependencies. However, it is guaranteed that all those
predecessors that are prior to us in reverse postorder will be done
before us. Since we iterate of the basic blocks in reverse postorder,
the number x above, is precisely the count of the number of predecessors
prior to us in reverse postorder.

Reviewers: myatsina
Differential Revision: https://reviews.llvm.org/D28759

llvm-svn: 293571

578cf7aa

Jan 30, 2017
- GlobalISel: correctly translate invoke when callee is a register. · 2bf8c9d3
  Tim Northover authored Jan 30, 2017
  
  This should fix the GlobalISel verifier. llvm-svn: 293550
  2bf8c9d3
- GlobalISel: account for differing exception selector sizes. · c9449704
  Tim Northover authored Jan 30, 2017
  
  For some reason the exception selector register must be a pointer (that's assumed by SDag); on the other hand, it gets moved into an IR-level type which might be entirely different (i32 on AArch64). IRTranslator needs to be aware of this. llvm-svn: 293546
  c9449704
- GlobalISel: tidy up def/use test. NFC. · c94d7033
  Tim Northover authored Jan 30, 2017
  
  llvm-svn: 293545
  c94d7033
- GlobalISel: translate memset & memmove. · 79f43f19
  Tim Northover authored Jan 30, 2017
  
  llvm-svn: 293541
  79f43f19
- GlobalISel: permit unused vregs without a register-class after ISel. · 480609d0
  Tim Northover authored Jan 30, 2017
  
  This can happen if earlier combining has removed all uses of some VReg, which is fine and shouldn't flag an error. llvm-svn: 293537
  480609d0
- Use SelectionDAG::getBuildVector helper function where possible. NFCI. · ffe2535c
  Simon Pilgrim authored Jan 30, 2017
  
  llvm-svn: 293532
  ffe2535c
- SDAG: Update ChainNodesMatched during UpdateChains if a node is replaced · 8f520a73
  Justin Bogner authored Jan 30, 2017
  
  Previously, we would hit UB (or the ISD::DELETED_NODE assert) if we happened to replace a node during UpdateChains, because it would be left in the list we were iterating over. This nulls out the pointer when that happens so that we can avoid the issue. Fixes llvm.org/PR31710 llvm-svn: 293522
  8f520a73
- Use SelectionDAG::getBuildVector/getSplatBuildVector helper functions where possible. NFCI. · 0a5ab5c4
  Simon Pilgrim authored Jan 30, 2017
  
  llvm-svn: 293520
  0a5ab5c4
- DAG: Fold fneg into compare with constant into the constant · 32e6bfa2
  Matt Arsenault authored Jan 30, 2017
  
  fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred) InstCombine already does this. llvm-svn: 293512
  32e6bfa2
- unique_ptrify some containers in GlobalISel::RegisterBankInfo · a66696f2
  David Blaikie authored Jan 30, 2017
  
  To simplify/clarify memory ownership, make leaks (as one was found/fixed recently) harder to write, etc. (also, while I was there - removed a duplicate lookup in a container) llvm-svn: 293506
  a66696f2
- DAG: Constant fold fp16_to_fp/fp16_to_fp · 0c687390
  Matt Arsenault authored Jan 30, 2017
  
  This fixes emitting conversions of constants on targets without legal f16 that need to use these for legalization. llvm-svn: 293499
  0c687390
- [GlobalISel] Add support for indirectbr · 65a12c01
  Kristof Beyls authored Jan 30, 2017
  
  Differential Revision: https://reviews.llvm.org/D28079 llvm-svn: 293470
  65a12c01
Jan 29, 2017

MachineInstr: Remove parameter from dump() · a4976c61

Matthias Braun authored Jan 29, 2017

The primary use of the dump() functions in LLVM is for use in a
debugger. Unfortunately lldb does not seem to handle default arguments
so using `p SomeMI.dump()` fails and you have to type the longer `p
SomeMI.dump(nullptr)`. Remove the paramter to make the most common use
easy. (You can always construct something like `p
SomeMI.print(dbgs(),MyTII)` if you need more features).

Differential Revision: https://reviews.llvm.org/D29241

llvm-svn: 293440

a4976c61

[SelectionDAG] Make SDNode::getConstantOperandVal an inline method. · 135da1fa
Craig Topper authored Jan 29, 2017
```
It's operation already exists manually in many places without using the method.

llvm-svn: 293421
```
135da1fa