Commits · 17fc326d9a504e9e13ce1c9931d24587795be639 · Roger Ferrer / llvm-epi

Dec 02, 2016

CODE_OWNERS: Take ownership of IR Linker as discussed on llvm-dev · 17fc326d
Teresa Johnson authored Dec 02, 2016
```
llvm-svn: 288500
```
17fc326d
[X86][SSE] Add support for extracting constant bit data from broadcasted constants · cbf5f970
Simon Pilgrim authored Dec 02, 2016
```
llvm-svn: 288499
```
cbf5f970

[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. · e8e94a71

Alexey Bataev authored Dec 02, 2016

When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.

Differential Revision: https://reviews.llvm.org/D27215

llvm-svn: 288497

e8e94a71

[X86] Refactored getTargetConstantBitsFromNode to allow for expansion. NFCI. · b3ae4168

Simon Pilgrim authored Dec 02, 2016

getTargetConstantBitsFromNode currently only extracts constant pool vector data, but it will need to be generalized to support broadcast and scalar constant pool data as well.

Converted Constant bit extraction and Bitset splitting to helper lambda functions.

llvm-svn: 288496

b3ae4168

[SLPVectorizer][X86] Add tests for vectorization of buildvector of scalar fp-ops (PR6246) · c70d3796
Simon Pilgrim authored Dec 02, 2016
```
llvm-svn: 288492
```
c70d3796
[AVX-512] Add EVEX vpshuflw/vpshufhw/vpshufd instructions to load folding tables. · 4961fa9b
Craig Topper authored Dec 02, 2016
```
llvm-svn: 288484
```
4961fa9b
[AVX-512] Add EVEX PSHUFB instructions to load folding tables. · 17ddb521
Craig Topper authored Dec 02, 2016
```
llvm-svn: 288482
```
17ddb521
[AVX-512] Add masked VINSERTF/VINSERTI instructions to load folding tables. · f7866fad
Craig Topper authored Dec 02, 2016
```
llvm-svn: 288481
```
f7866fad

IR: Move NumElements field from {Array,Vector}Type to SequentialType. · bc070524

Peter Collingbourne authored Dec 02, 2016

Now that PointerType is no longer a SequentialType, all SequentialTypes
have an associated number of elements, so we can move that information to
the base class, allowing for a number of simplifications.

Differential Revision: https://reviews.llvm.org/D27122

llvm-svn: 288464

bc070524

Change LoopUnrollPass cost from int to unsigned to make it consistent. (NFC) · c3be2258
Dehao Chen authored Dec 02, 2016
```
llvm-svn: 288463
```
c3be2258

IR: Change PointerType to derive from Type rather than SequentialType. · 4568158c

Peter Collingbourne authored Dec 02, 2016

As proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/106640.html

This is for a couple of reasons:

- Values of type PointerType are unlike the other SequentialTypes (arrays
  and vectors) in that they do not hold values of the element type. By moving
  PointerType we can unify certain aspects of how the other SequentialTypes
  are handled.
- PointerType will have no place in the SequentialType hierarchy once
  pointee types are removed, so this is a necessary step towards removing
  pointee types.

Differential Revision: https://reviews.llvm.org/D26595

llvm-svn: 288462

4568158c

Fix GlobalISel build. · 25a40759
Peter Collingbourne authored Dec 02, 2016
```
llvm-svn: 288460
```
25a40759
ConstantFolding: Factor code into helper function · 47a4b396
Matt Arsenault authored Dec 02, 2016
```
llvm-svn: 288459
```
47a4b396

IR: Change the gep_type_iterator API to avoid always exposing the "current" type. · ab85225b

Peter Collingbourne authored Dec 02, 2016

Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.

Differential Revision: https://reviews.llvm.org/D26594

llvm-svn: 288458

ab85225b

[DWARF] Put linkage-name on abstract origin even when there's a declaration. · dad4907b

Paul Robinson authored Dec 02, 2016

In r266692, we made it possible to emit linkage names for just inlined
functions, putting the attribute on the abstract origin. Make sure we
don't think the linkage-name was already emitted on a declaration.

Differential Revision: http://reviews.llvm.org/D27320

llvm-svn: 288450

dad4907b

[ThinLTO] Stop importing constant global vars as copies in the backend · 185b4ab6

Teresa Johnson authored Dec 02, 2016

Summary:
We were doing an optimization in the ThinLTO backends of importing
constant unnamed_addr globals unconditionally as a local copy (regardless
of whether the thin link decided to import them). This should be done in
the thin link instead, so that resulting exported references are marked
and promoted appropriately, but will need a summary enhancement to mark
these variables as constant unnamed_addr.

The function import logic during the thin link was trying to handle
this proactively, by conservatively marking all values referenced in
the initializer lists of exported global variables as also exported.
However, this only handled values referenced directly from the
initializer list of an exported global variable. If the value is itself
a constant unnamed_addr variable, we could end up exporting its
references as well. This caused multiple issues. The first is that the
transitively exported references weren't promoted. Secondly, some could
not be promoted/renamed (e.g. they had a section or other constraint).
recursively, instead of just adding the first level of initializer list
references to the ExportList directly.

Remove this optimization and the associated handling in the function
import backend. SPEC measurements indicate we weren't getting much
from it in any case.

Fixes PR31052.

Reviewers: mehdi_amini

Subscribers: krasin, llvm-commits

Differential Revision: https://reviews.llvm.org/D26880

llvm-svn: 288446

185b4ab6

AMDGPU: Use wider scalar spills for SGPR spilling · c47701c0

Matt Arsenault authored Dec 02, 2016

Since the spill is for the whole wave, these
don't have the swizzling problems that vector stores do
and a single 4-byte allocation is enough to spill a 64 element
register. This should reduce the number of spill instructions and
put all the spills for a register in the same cacheline.

This should save allocated private size, but for now it doesn't.
The extra slots are allocated for each component, but never used
because the frame layout is essentially finalized before frame
indices are replaced. For always using the scalar store path,
this should probably be moved into processFunctionBeforeFrameFinalized.

llvm-svn: 288445

c47701c0

When instructions are hoisted out of loops by MachineLICM, remove their debug loc. · 42f92a72

Wolfgang Pieb authored Dec 02, 2016

This prevents erratic stepping behavior as well as incorrect source attribution
for sample profiling.

Reviewers: dblakie

Subscribers: llvm-commit

Differential Revision: https://reviews.llvm.org/D27290

llvm-svn: 288442

42f92a72

SDAG: Avoid a large, usually empty SmallVector in a recursive function · 35c5e58f

Justin Bogner authored Dec 02, 2016

This SmallVector is using up 128 bytes on the stack every time despite
almost always being empty[1], and since this function can recurse quite
deeply that adds up to a lot of overhead. We've seen this run afoul of
ulimits in some cases with ASAN on.

Replacing the SmallVector with a std::vector trades an occasional heap
allocation for vastly less stack usage.

[1]: I gathered some stats on an internal test suite and the vector
was non-empty in only 45,000 of 10,000,000 calls to this function.

llvm-svn: 288441

35c5e58f

[AArch64] Fold more spilled/refilled COPYs. · 7ffce7be

Geoff Berry authored Dec 01, 2016

Summary:
Make AArch64InstrInfo::foldMemoryOperandImpl more general by folding all
full COPYs between register classes of the same size that are either
spilled or refilled.

Reviewers: MatzeB, qcolombet

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27271

llvm-svn: 288439

7ffce7be

[MC] Refactor emitELFSize to make usage more consistent. NFC. · 734c59d5

Dan Gohman authored Dec 01, 2016

Move the cast<MCSymbolELF> inside emitELFSize, so that: 
 - it's done in one place instead of at each call
 - it's more consistent with similar functions like EmitCOFFSafeSEH
 - ambiguity between cast<> and dyn_cast<> is avoided (which also
   eliminates an unnecessary dyn_cast call)

This also makes it easier to experiment with using ".size" directives on
non-ELF targets.

llvm-svn: 288437

734c59d5

llvm-modextract: Call keep() on the output stream before exiting. · 85c2184a
Peter Collingbourne authored Dec 01, 2016
```
llvm-svn: 288435
```
85c2184a

Dec 01, 2016

[ARM] Fix for 64-bit CAS expansion on ARM32 with -O0 · e2ae4151

Oleg Ranevskyy authored Dec 01, 2016

Summary:
This patch fixes comparison of 64-bit atomic with its expected value in CMP_SWAP_64 expansion.

Currently, the low words are compared with CMP, while the high words are compared with SBC. SBC expects the carry flag to be set if CMP detects a difference. CMP might leave the carry unset for unequal arguments though if the first one is >= than the second. This might cause the comparison logic to detect false equality.

Example of the broken C++ code:
```
std::atomic<long long> at(2);

long long ll = 1;
std::atomic_compare_exchange_strong(&at, &ll, 3);
```
Even though the atomic `at` and the expected value `ll` are not equal and `atomic_compare_exchange_strong` returns `false`, `at` is changed to 3.

The patch replaces SBC with CMPEQ.

Reviewers: t.p.northover

Subscribers: aemerson, rengolin, llvm-commits, asl

Differential Revision: https://reviews.llvm.org/D27315

llvm-svn: 288433

e2ae4151

Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements." · 704395a2
Artem Belevich authored Dec 01, 2016
```
This reverts r288412 which causes severe compile-time regression.

llvm-svn: 288431
```
704395a2

RegisterCoalscer: Only coalesce complete reserved registers. · 709a4cc2

Matthias Braun authored Dec 01, 2016

The coalescer eliminates copies from reserved registers of the form:
   %vregX = COPY %rY
in the case where %rY is a reserved register. However this turns out to
be invalid if only some of the subregisters are reserved (see also
https://reviews.llvm.org/D26648).

Differential Revision: https://reviews.llvm.org/D26687

llvm-svn: 288428

709a4cc2

Fix broken buildbots because of r288424 (NFC). · e7c0b2e0
Eugene Zelenko authored Dec 01, 2016
```
llvm-svn: 288426
```
e7c0b2e0

[ADT, Support, TableGen] Fix some Clang-tidy modernize-use-default and Include... · f65e4ce2

Eugene Zelenko authored Dec 01, 2016

[ADT, Support, TableGen] Fix some Clang-tidy modernize-use-default and Include What You Use warnings; other minor fixes (NFC).

llvm-svn: 288424

f65e4ce2

[dsymutil] Simplify a lazy-init condition/expression · 4aa8175a
David Blaikie authored Dec 01, 2016
```
llvm-svn: 288423
```
4aa8175a
[debug info] Minor cleanup from D27170/r288399 · e40caaee
David Blaikie authored Dec 01, 2016
```
llvm-svn: 288421
```
e40caaee

[SelectionDAG] getRawSubclassData should not return HasDebugValue. · 76b913c4

Chih-Hung Hsieh authored Dec 01, 2016

This change fixes a regression in r279537 and
makes getRawSubclassData behave like r279536.
Without this change, the fp128-g.ll test case will have an
infinite loop involving SoftenFloatRes_LOAD.

Differential Revision: http://reviews.llvm.org/D26942

llvm-svn: 288420

76b913c4

AArch64: fix 128-bit cmpxchg at -O0 (again, again). · 5bb87b67

Tim Northover authored Dec 01, 2016

This time the issue is fortunately just a simple mistake rather than a horrible
design spectre. I thought SUBS/SBCS provided sufficient NZCV flags for
comparing two 64-bit values, but they don't.

The fix is slightly clunkier in AArch64 because we can't use conditional
execution to emit a pair of CMPs. Traditionally an "icmp ne i128" would map to
an EOR/EOR/ORR/CBNZ, but that uses more registers so it's easier to go with a
CSET/CINC/CBNZ combination. Slightly less efficient, but this is -O0 anyway.

Thanks to Anton Korobeynikov for pointing out the issue.

llvm-svn: 288418

5bb87b67

Improve documentation on MSVC workaround for AlignedCharArray (NFC) · 87394714

Mehdi Amini authored Dec 01, 2016

The comment only mentioned "old version of MSVC".

Differential Revision: https://reviews.llvm.org/D27312

llvm-svn: 288417

87394714

Fix unused variable warning in Release builds. NFC. · 215b22e6
Benjamin Kramer authored Dec 01, 2016
```
llvm-svn: 288416
```
215b22e6

[PR29121] Don't fold if it would produce atomic vector loads or stores · 89e92d21

Philip Reames authored Dec 01, 2016

The instcombine code which folds loads and stores into their use types can trip up if the use is a bitcast to a type which we can't directly load or store in the IR. In principle, such types shouldn't exist, but in practice they do today. This is a workaround to avoid a bug while we work towards the long term goal.

Differential Revision: https://reviews.llvm.org/D24365

llvm-svn: 288415

89e92d21

Factor out common parts of LVI and Float2Int into ConstantRange [NFCI] · 4d00af1b

Philip Reames authored Dec 01, 2016

This just extracts out the transfer rules for constant ranges into a single shared point. As it happens, neither bit of code actually overlaps in terms of the handled operators, but with this change that could easily be tweaked in the future.

I also want to have this separated out to make experimenting with a eager value info implementation and possibly a ValueTracking-like fixed depth recursion peephole version. There's no reason all four of these can't share a common implementation which reduces the chances of bugs.

Differential Revision: https://reviews.llvm.org/D27294

llvm-svn: 288413

4d00af1b

[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. · 2c01af59

Alexey Bataev authored Dec 01, 2016

When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.

Differential Revision: https://reviews.llvm.org/D27215

llvm-svn: 288412

2c01af59

[WebAssembly] Define more wasm binary encoding constants. · 3ec875d2
Dan Gohman authored Dec 01, 2016
```
llvm-svn: 288411
```
3ec875d2
Refactored X86InterleavedAccess into a class. NFCI. · 0e3ae305
David L Kreitzer authored Dec 01, 2016
```
Patch by Farhana Aleen

Differential Revision: https://reviews.llvm.org/D25986

llvm-svn: 288410
```
0e3ae305

[tablegen] Delete duplicates from a vector without skipping elements · 47de8391

Vedant Kumar authored Dec 01, 2016

Tablegen's -gen-instr-info pass has a bug in its emitEnums() routine.
The function intends for values in a vector to be deduplicated, but it
accidentally skips over elements after performing a deletion.

I think there are smarter ways of doing this deduplication, but we can
do that in a follow-up commit if there's interest. See the thread:
[PATCH] TableGen InstrMapping Bug fix.

Patch by Tyler Kenney!

llvm-svn: 288408

47de8391

Remove unused header, NFC. · 618d78ca
Vedant Kumar authored Dec 01, 2016
```
llvm-svn: 288407
```
618d78ca