Commits · 9f02a94dad81f6f2adb775a1b34c9c30e2b9887f · Lorenzo Albano / LLVM bpEVL

Jul 03, 2018

[ThinLTO] Fix printing of module paths for distributed backend indexes · 8fc76668

Teresa Johnson authored Jul 02, 2018

Summary:
In the individual index files emitted for distributed ThinLTO backends,
the module path ids are not contiguous. Assign slots to module paths in
order to handle this better and also to get contiguous numbering in the
summary assembly.

Reviewers: davidxl, dexonsmith

Subscribers: mehdi_amini, inglorion, eraman, llvm-commits, steven_wu

Differential Revision: https://reviews.llvm.org/D48698

llvm-svn: 336148

8fc76668

Jul 02, 2018

[WebAssembly] Support for atomic stores · 402b4908

Heejin Ahn authored Jul 02, 2018

Summary: Add support for atomic store instructions.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D48839

llvm-svn: 336145

402b4908

[ARM] Fix PR37382: Don't optimize mul.with.overflow on thumbv6m. · fd10286e

Vadzim Dambrouski authored Jul 02, 2018

Reviewers: efriedma, rogfer01, javed.absar

Reviewed By: efriedma, rogfer01

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D48846

llvm-svn: 336144

fd10286e

[llvm-mca] Clear the content of map VariantDescriptors in InstrBuilder before... · 9b3cb081

Andrea Di Biagio authored Jul 02, 2018

[llvm-mca] Clear the content of map VariantDescriptors in InstrBuilder before we start analyzing a new CodeBlock. NFCI.

Different CodeBlocks don't overlap. The same MCInst cannot appear in more than
one code block because all blocks are instantiated before the simulation is run.

We should always clear the content of map VariantDescriptors before every
simulation, since VariantDescriptors cannot possibly store useful information
for the next blocks. It is also "safer" to clear its content because `MCInst*`
is used as the key type for map VariantDescriptors.

llvm-svn: 336142

9b3cb081

[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428). · c7cef4bc

Tim Shen authored Jul 02, 2018

Summary:
Comment on Transforms/LoopVersioning/incorrect-phi.ll: With the change
SCEV is able to prove that the loop doesn't wrap-self (due to zext i16
to i64), disabling the entire loop versioning pass. Removed the zext and
just use i64.

Reviewers: sanjoy

Subscribers: jlebar, hiraditya, javed.absar, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D48409

llvm-svn: 336140

c7cef4bc

[WebAssembly] Fix fast-isel optimization of branch conditions. · b01d8762

Dan Gohman authored Jul 02, 2018

LLVM doesn't guarantee anything about the high bits of a register holding
an i1 value at the IR level, so don't translate LLVM IR i1 values directly
into WebAssembly conditional branch operands. WebAssembly's conditional
branches do demand all 32 bits be valid.

Fixes PR38019.

llvm-svn: 336138

b01d8762

[X86] Add phony registers for high halves of regs with low halves · fd974949

Krzysztof Parzyszek authored Jul 02, 2018

Add registers still missing after r328016 (D43353):
- for bits 15-8  of SI, DI, BP, SP (*H), and R8-R15 (*BH),
- for bits 31-16 of R8-R15 (*WH).

Thanks to Craig Topper for pointing it out.

llvm-svn: 336134

fd974949

Replace "Replacable" with "Replaceable". [NFC] · 0e15501f
Alina Sbirlea authored Jul 02, 2018
```
llvm-svn: 336133
```
0e15501f
Replace unused output filenames with /dev/null in tests · f50ad6c3
Fangrui Song authored Jul 02, 2018
```
Similar to rLLD336129

llvm-svn: 336131
```
f50ad6c3

[SLP] Recognize min/max pattern using instructions producing same values. · 3b416db1

Farhana Aleen authored Jul 02, 2018

Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization.

         %1 = extractelement <2 x i32> %a, i32 0
         %2 = extractelement <2 x i32> %a, i32 1
         %cond = icmp sgt i32 %1, %2
         %3 = extractelement <2 x i32> %a, i32 0
         %4 = extractelement <2 x i32> %a, i32 1
         %select = select i1 %cond, i32 %3, i32 %4

Author: FarhanaAleen

Reviewed By: ABataev, RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D47608

llvm-svn: 336130

3b416db1

[InstCombine] reverse canonicalization of add --> or to allow more shuffle folding · b999d741

Sanjay Patel authored Jul 02, 2018

This extends D48485 to allow another pair of binops (add/or) to be combined either
with or without a leading shuffle:
or X, C --> add X, C (when X and C have no common bits set)

Here, we need value tracking to determine that the 'or' can be reversed into an 'add',
and we've added general infrastructure to allow extending to other opcodes or moving 
to where other passes could use that functionality.

Differential Revision: https://reviews.llvm.org/D48662

llvm-svn: 336128

b999d741

[MC] Error on a .zerofill directive in a non-virtual section · 4d5b1073

Francis Visoiu Mistrih authored Jul 02, 2018

On darwin, all virtual sections have zerofill type, and having a
.zerofill directive in a non-virtual section is not allowed. Instead of
asserting, show a nicer error.

In order to use the equivalent of .zerofill in a non-virtual section,
the usage of .zero of .space is required.

This patch replaces the assert with an error.

Differential Revision: https://reviews.llvm.org/D48517

llvm-svn: 336127

4d5b1073

nm: Add -no-weak flag for hiding weak symbols · d4f77a52

Dave Lee authored Jul 02, 2018

Summary:
This adds a new -no-weak flag to nm to hide weak symbols in its output.
This also adds a -W alias for this which is analogous to -U.

Patch by Keith Smiley

Reviewers: kastiglione, enderby, compnerd

Reviewed By: kastiglione

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48751

llvm-svn: 336126

d4f77a52

[SLPVectorizer][X86] Begin adding alternate tests for call operators · 35f196c1

Simon Pilgrim authored Jul 02, 2018

Alternate opcode handling only supports binary operators, these tests demonstrate a missed opportunity to vectorize ceil/floor calls

llvm-svn: 336125

35f196c1

Tighten up a test for -check-debugify, NFC · 9b6c096f

Vedant Kumar authored Jul 02, 2018

Use an -implicit-check-not to make sure an error which should not occur
in fact does not occur before the first CHECK line.

Suggested by Paul Robinson in post-commit feedback for r335897.

llvm-svn: 336123

9b6c096f

[CostModel][X86] Add cost tests for fp rounding intrinsics · ac193d4b
Simon Pilgrim authored Jul 02, 2018
```
Add cost tests for fp ceil, floor, nearbyint, rint and trunc.

llvm-svn: 336122
```
ac193d4b

[X86] Don't use aligned load/store instructions for fp128 if the load/store isn't aligned. · 56440b97

Craig Topper authored Jul 02, 2018

Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions.

Should fix PR38001

llvm-svn: 336121

56440b97

[AArch64][GlobalISel] Any-extend vararg parameters to stack slot size on Darwin. · 846f2436

Amara Emerson authored Jul 02, 2018

We currently don't any-extend vararg parameters before storing them to the stack
locations on Darwin. However, SelectionDAG however does this, and so user code
is in the wild which inadvertently relies on this extension. This can manifest
in cases where the value stored is (int)0, but the actual parameter is interpreted
by va_arg as a pointer, and so not extending to 64 bits causes the callee to
load additional undefined bits.

llvm-svn: 336120

846f2436

Revert "[Dominators] Add the DomTreeUpdater class" · 198f3b16

Jakub Kuderski authored Jul 02, 2018

Temporary revert because of a failing test on some buildbots.

This reverts commit r336114.

llvm-svn: 336117

198f3b16

[WebAssembly] Convert remaining tests from elf to wasm output format · 7fecdef5
Sam Clegg authored Jul 02, 2018
```
Differential Revision: https://reviews.llvm.org/D48748

llvm-svn: 336116
```
7fecdef5
Follow up of r335953 - [ARM][AArch64] Armv8.4-A Enablement · b0004b83
Sjoerd Meijer authored Jul 02, 2018
```
Imply dotprod for armv8.4-a, because it is mandatory from v8.4.

llvm-svn: 336115
```
b0004b83

[Dominators] Add the DomTreeUpdater class · e813a9b3

Jakub Kuderski authored Jul 02, 2018

Summary:
This patch is the first in a series of patches related to the [[ http://lists.llvm.org/pipermail/llvm-dev/2018-June/123883.html | RFC - A new dominator tree updater for LLVM ]].

This patch introduces the DomTreeUpdater class, which provides a cleaner API to perform updates on available dominator trees (none, only DomTree, only PostDomTree, both) using different update strategies (eagerly or lazily) to simplify the updating process.

—Prior to the patch—

   - Directly calling update functions of DominatorTree updates the data structure eagerly while DeferredDominance does updates lazily.
   - DeferredDominance class cannot be used when a PostDominatorTree also needs to be updated.
   - Functions receiving DT/DDT need to branch a lot which is currently necessary.
   - Functions using both DomTree and PostDomTree need to call the update function separately on both trees.
   - People need to construct an additional DeferredDominance class to use functions only receiving DDT.

—After the patch—

Patch by Chijun Sima <simachijun@gmail.com>.

Reviewers: kuhar, brzycki, dmgreen, grosser, davide

Reviewed By: kuhar, brzycki

Subscribers: vsk, mgorny, llvm-commits

Author: NutshellySima

Differential Revision: https://reviews.llvm.org/D48383

llvm-svn: 336114

e813a9b3

[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values · 2bc8e079

Simon Pilgrim authored Jul 02, 2018

We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case.

llvm-svn: 336113

2bc8e079

[X86][SSE] Add v8i16 shift test for 2 shift values that doesn't match basic blend · a6be2437

Simon Pilgrim authored Jul 02, 2018

We have special case support for 2 shift values for basic blends, but irregular shift patterns end up using the generic lowering, despite shuffle lowering being good enough to handle more complex blends.

llvm-svn: 336112

a6be2437

[ValueTracking] allow undef elements when matching vector abs · 284ba0c1
Sanjay Patel authored Jul 02, 2018
```
llvm-svn: 336111
```
284ba0c1
Disable failing test on x86_64-pc-windows-gnu, see PR38006. · d414c6c1
Yaron Keren authored Jul 02, 2018
```
llvm-svn: 336110
```
d414c6c1

[CodeGen] Make block removal order deterministic in CodeGenPrepare · 23bba56f

David Stenberg authored Jul 02, 2018

Summary:
Replace use of a SmallPtrSet with a SmallSetVector to make the worklist
iteration order deterministic. This is done as the order the blocks are
removed may affect whether or not PHI nodes in successor blocks are
removed.

For example, consider the following case where %bb1 and %bb2 are
removed:

    bb1:
      br i1 undef, label %bb3, label %bb4
    bb2:
      br i1 undef, label %bb4, label %bb3
    bb3:
      pv1 = phi type [ undef, %bb1 ], [ undef, %bb2], [ v0, %other ]
      br label %bb4
    bb4:
      pv2 = phi type [ undef, %bb1 ], [ undef, %bb2 ],
                     [ pv1, %bb3 ], [ v0, %other ]

If %bb2 is removed before %bb1, the incoming values from %bb1 and %bb2
to pv1 will be removed before %bb1 is removed as a predecessor to %bb4.
The pv1 node will thus be optimized out (to v0) at the time %bb1 is
removed as a predecessor to %bb4, leaving the blocks as following when
the incoming value from %bb1 has been removed:

    bb3: ; pv1 optimized out, incoming value to pv2 is v0
      br label %bb4
    bb4:
      pv2 = phi type [ v0, %bb3 ], [ v0, %other ]

The pv2 PHI node will be optimized away by removePredecessor() as all
incoming values are identical.

In case %bb2 is removed after %bb1, pv1 will not be optimized out at the
time %bb2 is removed as a predecessor to %bb4, leaving the blocks as
following when the incoming value from %bb2 to pv2 has been removed:

    bb3:
      pv1 = phi type [ undef, %bb2 ], [ v0, %other ]
      br label %bb4
    bb4:
      pv2 = phi type [ pv1, %bb3 ], [ v0, %other ]

The pv2 PHI node will thus not be removed in this case, ultimately
leading to the following output

    bb3: ; pv1 optimized out, incoming value to pv2 is v0
      br label %bb4
    bb4:
      pv2 = phi type [ v0, %bb3 ], [ v0, %other ]

I have not looked into changing DeleteDeadBlock() so that the redundant
PHI nodes are removed.

I have not added a test case, as I was not able to create a particularly
small and (not messy) reproducer. This is likely due to SmallPtrSet
behaving deterministically when in small mode.

Reviewers: void, dexonsmith, spatel, skatkov, fhahn, bkramer, nhaehnle

Reviewed By: fhahn

Subscribers: mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D48369

llvm-svn: 336109

23bba56f

[X86] Fix test/MC/AsmParser/exprs-invalid.s after rL336104 · 07ef10cc

Alex Bradbury authored Jul 02, 2018

This was my mistake for only running test/MC/X86 and test/CodeGen/X86. 
Arguably .word should be removed from this test, as it is not supported 
universally.

llvm-svn: 336107

07ef10cc

[llvm-exegesis] Change how the native architecture is determined · 346856dc

John Brawn authored Jul 02, 2018

Currently the llvm-exegesis native architecture is determined by comparing the
llvm native architecture with X86, so to add a new target would mean adding a
new check. Change this to building up a list of the targets llvm-exegesis
supports then using that, as this means that when adding a new target you just
add the target to the list of supported targets.

Differential Revision: https://reviews.llvm.org/D48778

llvm-svn: 336105

346856dc

[X86] Use addAliasForDirective to support the .word directive (reland) · c4890878

Alex Bradbury authored Jul 02, 2018

The X86 asm parser currently has custom parsing logic for .word. Rather than
use this custom logic, we can just use addAliasForDirective to enable the
reuse of AsmParser::parseDirectiveValue.

See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon
(rL332607) backends.

Differential Revision: https://reviews.llvm.org/D47004

This is a fixed reland of rL336100. This should have been caught in 
pre-commit testing so apologies for the noise.

llvm-svn: 336104

c4890878

Revert r336100 · c000e4dc
Alex Bradbury authored Jul 02, 2018
```
This was a bad change. .word == 2byte on x86.

llvm-svn: 336103
```
c000e4dc

[SLPVectorizer] Remove nullptr early-outs from Instruction::ShuffleVector getEntryCost · d5fb50e3

Simon Pilgrim authored Jul 02, 2018

This code is only used by alternate opcodes so the InstructionsState has already confirmed that every Value is an Instruction, plus we use cast<Instruction> which will assert on failure.

llvm-svn: 336102

d5fb50e3

[InstCombine] adjust shuffle tests with IR flags; NFC · 951f617e

Sanjay Patel authored Jul 02, 2018

Due to current limitations in constant analysis, we need flags
on add or mul to show propagation for the potential transform
suggested in these tests (no other binops currently report 
identity constants).

llvm-svn: 336101

951f617e

[X86] Use addAliasForDirective to support the .word directive · 42485ec9

Alex Bradbury authored Jul 02, 2018

The X86 asm parser currently has custom parsing logic for .word. Rather than 
use this custom logic, we can just use addAliasForDirective to enable the 
reuse of AsmParser::parseDirectiveValue.

See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon 
(rL332607) backends.

Differential Revision: https://reviews.llvm.org/D47004

llvm-svn: 336100

42485ec9

[llvm-exegesis] Delegate the decision of cycle counter name to the target · 8fc5ec78

John Brawn authored Jul 02, 2018

Currently the cycle counter is taken from the subtarget schedule model, which
isn't any use if the subtarget doesn't have one. Delegate the decision to the
target benchmark runner, as it may know better what to do in that case, with
the default being the current behaviour.

Differential Revision: https://reviews.llvm.org/D48779

llvm-svn: 336099

8fc5ec78

Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters. · 4ebba909

Florian Hahn authored Jul 02, 2018

This version contains a fix to add values for which the state in ParamState change
to the worklist if the state in ValueState did not change. To avoid adding the
same value multiple times, mergeInValue returns true, if it added the value to
the worklist. The value is added to the worklist depending on its state in
ValueState.

Original message:
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.

Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.

Reviewers: dberlin, mssimpso, davide, efriedma

Reviewed By: mssimpso

Differential Revision: https://reviews.llvm.org/D43762

llvm-svn: 336098

4ebba909

[InstCombine] add tests for shuffle-binop; NFC · d9800845
Sanjay Patel authored Jul 02, 2018
```
This is another pattern mentioned in PR37806.

llvm-svn: 336096
```
d9800845

[SLPVectorizer] Fix alternate opcode + shuffle cost function to correct handle SK_Select patterns. · 265793d5

Simon Pilgrim authored Jul 02, 2018

We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case.

This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now...

llvm-svn: 336095

265793d5

[SLPVectorizer] Only Alternate opcodes use ShuffleVector cases for... · 409bd5f4

Simon Pilgrim authored Jul 02, 2018

[SLPVectorizer] Only Alternate opcodes use ShuffleVector cases for getEntryCost/vectorizeTree. NFCI.

Add assertions - we're already assuming this in how we use the AltOpcode and treat everything as BinaryOperators.

llvm-svn: 336092

409bd5f4

[AArch64][SVE] Asm: Support for (SQ)INCP/DECP (scalar, vector) · 8d4c01a7

Sander de Smalen authored Jul 02, 2018

Increments/decrements the result with the number of active bits
from the predicate.

The inc/dec variants added are:
- incp   x0, p0.h     (scalar)
- incp   z0.h, p0     (vector)

The unsigned saturating inc/dec variants added are:
- uqincp x0, p0.h     (scalar)
- uqincp w0, p0.h     (scalar, 32bit)
- uqincp z0.h, p0     (vector)

The signed saturating inc/dec variants added are:
- sqincp x0, p0.h     (scalar)
- sqincp x0, p0.h, w0 (scalar, 32bit)
- sqincp z0.h, p0     (vector)

llvm-svn: 336091

8d4c01a7