Commits · e6a74803d4ee59dce5eed24727c8f52bc9774e61 · Lorenzo Albano / LLVM bpEVL

Mar 18, 2020

[VPlan] Use underlying value for printing, if available. · e6a74803

Florian Hahn authored Mar 18, 2020

When the an underlying value is available, we can use its name for
printing, as discussed in D73078.

Reviewers: rengolin, hsaito, Ayal, gilr

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D76200

e6a74803

[ARM,MVE] Add intrinsics for the VQDMLAD family. · e13d153c

Simon Tatham authored Mar 18, 2020

Summary:
This is another set of instructions too complicated to be sensibly
expressed in IR by anything short of a target-specific intrinsic.
Given input vectors a,b, the instruction generates intermediate values
2*(a[0]*b[0]+a[1]+b[1]), 2*(a[2]*b[2]+a[3]+b[3]), etc; takes the high
half of each double-width values, and overwrites half the lanes in the
output vector c, which you therefore have to provide the input value
of. Optionally you can swap the elements of b so that the are things
like a[0]*b[1]+a[1]*b[0]; optionally you can round to nearest when
taking the high half; and optionally you can take the difference
rather than sum of the two products. Finally, saturation is applied
when converting back to a single-width vector lane.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76359

e13d153c

[gn build] remove a workaround that is no longer needed · 642a424b
Nico Weber authored Mar 18, 2020

642a424b
[NFC][PowerPC] Update test · fc2a5ef9
Sam Parker authored Mar 18, 2020
```
Run the update script on one of the loop unroll tests.
```
fc2a5ef9

AMDGPU: Initial, crude support for indirect calls · 4ea1baf6

Matt Arsenault authored Mar 16, 2020

This isn't really usable, and requires using the
-amdgpu-fixed-function-abi flag to work.

Assumes a uniform call target, and will hit a verifier error if the
call target ends up in a VGPR. Also doesn't attempt to do anything
sensible for the reported register/stack usage.

4ea1baf6

Reapply "AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize" · ea4597ee

Matt Arsenault authored Mar 18, 2020

This reverts commit 9bca8fc4.

Rearrange handling to avoid changing the instruction in the case where
it's going to be erased and replaced with undef.

ea4597ee

[AMDGPU] Fix AMDGPUUnifyDivergentExitNodes · d1a7bfca

Piotr Sobczak authored Mar 18, 2020

Summary:
For the case where "done" bits on existing exports are removed
by unifyReturnBlockSet(), unify all return blocks - even the
uniformly reached ones. We do not want to end up with a non-unified,
uniformly reached block containing a normal export with the "done"
bit cleared.

That case is believed to be rare - possible with infinite loops
in pixel shaders.

This is a fix for D71192.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76364

d1a7bfca

[gn build] add rebase changes that should have been in 9f981e9a · f57290ec
Nico Weber authored Mar 18, 2020

f57290ec
[ValueTracking] Add computeKnownBits DemandedElts support to AND instructions (PR36319) · 06150e83
Simon Pilgrim authored Mar 18, 2020

06150e83
Reland "[gn build] (manually) port 8b409eab" · 9f981e9a
Nico Weber authored Mar 18, 2020
```
This reverts commit 4060016f
and re-merges c5b81466.
```
9f981e9a

[InstCombine] GEPOperator::accumulateConstantOffset does not support scalable vectors · ef64ba83

Sander de Smalen authored Mar 18, 2020

Avoid transforming:

 %0 = bitcast i8* %base to <vscale x 16 x i8>*
 %1 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %0, i64 1

into:

 %0 = getelementptr i8, i8* %base, i64 16
 %1 = bitcast i8* %0 to <vscale x 16 x i8>*

Reviewers: efriedma, ctetreau

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76236

ef64ba83

[PowerPC][AIX] Implement by-val caller arguments in a single register. · c2186647

Chris Bowler authored Mar 18, 2020

This is the first of a series of patches that adds caller support for
by-value arguments. This patch add support for arguments that are passed in a
single GPR.

There are 3 limitation cases:
-The by-value argument is larger than a single register.
-There are no remaining GPRs even though the by-value argument would
otherwise fit in a single GPR.
-The by-value argument requires alignment greater than register width.

Future patches will be required to add support for these cases as well
as for the callee handling (in LowerFormalArguments_AIX) that
corresponds to this work.

Differential Revision: https://reviews.llvm.org/D75863

c2186647

[InstCombine][X86] Add additional demandedelts style test for in-range... · 24c2e613

Simon Pilgrim authored Mar 18, 2020

[InstCombine][X86] Add additional demandedelts style test for in-range variable per-element shift amounts (PR40391)

If we've shuffled the shift amount some of the (undemanded) elements may have become undef - this should be handled by the missing support in PR36319.

24c2e613

Fix `warning: extra ‘;’` (NFC) · 4d506da9
Mehdi Amini authored Mar 18, 2020

4d506da9

Fix build with gcc 7.5 by adding a "redundant move" · f3e297d9

Mehdi Amini authored Mar 18, 2020

The constructor of Expected<T> expects as T&&, but gcc-7.5 does not
infer an rvalue in this context apparently.

f3e297d9

[NFCI][SCEV] Avoid recursion in SCEVExpander::isHighCostExpansion*() · 85334b03

Roman Lebedev authored Mar 18, 2020

Summary:
As noted in [[ https://bugs.llvm.org/show_bug.cgi?id=45201 | PR45201 ]],
[[ https://bugs.llvm.org/show_bug.cgi?id=10090 | PR10090 ]] SCEV doesn't
always avoid recursive algorithms, and that causes issues with
large expression depths and/or smaller stack sizes.

In `SCEVExpander::isHighCostExpansion*()` case, the refactoring to avoid
recursion is rather idiomatic. We simply need to place the root expr
into a vector, and iterate over vector elements accounting for the cost
of each one, adding new exprs at the end of the vector,
thus achieving recursion-less traversal.

The order in which we will visit exprs doesn't matter here,
so we will be fine with the most basic approach of using SmallVector
and inserting/extracting from the back, which accidentally is the same
depth-first traversal that we were doing previously recursively.

Reviewers: mkazantsev, reames, wmi, ekatz

Reviewed By: mkazantsev

Subscribers: hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76273

85334b03

[IPRA][ARM] Spill extra registers at -Oz · 73cea83a

Oliver Stannard authored Jul 18, 2019

When optimising for code size at the expense of performance, it is often
worth saving and restoring some of r0-r3, if IPRA will be able to take
advantage of them. This doesn't cost any extra code size if we already
have a PUSH/POP pair, and increases the number of available registers
across any calls to the function.

We already have an optimisation which tries fold the subtract/add of the
SP into the PUSH/POP by using extra registers, which somewhat conflicts
with this. I've made the new optimisation less aggressive in cases where
the existing one is likely to trigger, which gives better results than
either of these optimisations by themselves.

Differential revision: https://reviews.llvm.org/D69936

73cea83a

[Alignment][NFC] Deprecate getMaxAlignment · d000655a

Guillaume Chatelet authored Mar 18, 2020

Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jholewinski, arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76348

d000655a

[NFC][PowerPC] Add a new MIR file to test if-converter pass · 96b70809
Kang Zhang authored Mar 18, 2020

96b70809
[NFC] Add missing REQUIRES clause to a test · 2aaafaf5
Danila Malyutin authored Mar 18, 2020

2aaafaf5

[ARM] Track epilogue instructions with FrameDestroy flag (NFC) · 6739805e

Oliver Stannard authored Jul 18, 2019

Rather than trying to work out which instructions are part of the
epilogue by examining them, we can just mark them with the FrameDestroy
flag, like we do in the AArch64 backend.

6739805e

[llvm][SVE] Addressing mode for FF/NF loads. · 9bdcd9bf

Francesco Petrogalli authored Mar 09, 2020

Summary:
This patch adds addressing mode computation for the following SVE
instructions:

* ldff1{s}<T1> { <Zt>.<T2> }, <Pg>/Z, [<Xn|SP>{, <Xm>{, lsl #imm}}]
* ldnf1{s}<T1> { <Zt>.<T2> }, <Pg>/Z, [<Xn|SP>{, #<imm>, mul vl}]

Reviewers: andwar, sdesmalen, rengolin, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76209

9bdcd9bf

[AArch64][SVE] Change pointer type of nontemporal load/store intrinsics · 4788ca45

Sander de Smalen authored Mar 18, 2020

Summary:
This fixes a discrepancy between the non-temporal loads/store
intrinsics and other SVE load intrinsics (such as nf/ff), so
that Clang can use the same code to generate these intrinsics.

Reviewers: andwar, kmclaughlin, rengolin, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76237

4788ca45

Fix possible assertion when using PBQP with debug info · 940ba146

Danila Malyutin authored Mar 17, 2020

Skip debug instructions before calling functions not expecting them.
In particular, LIS.getInstructionIndex(*mi) would fail if mi was a debg instr.

Differential Revision: https://reviews.llvm.org/D76129

940ba146

[DebugInfo] Fix multi-byte entry values in call site values · a0a3a9c5

David Stenberg authored Mar 18, 2020

Summary:
In D67768/D67492 I added support for entry values having blocks larger
than one byte, but I now noticed that the DIE implementation I added there
was broken. The takeNodes() function, that moves the entry value block
from a temporary buffer to the output buffer, would destroy the input
iterator when transferring the first node, meaning that only that node
was moved.

In practice, this meant that when emitting a call site value using a
DW_OP_entry_value operation with a DWARF register number larger than 31,
that multi-byte DW_OP_regx expression would be truncated.

Reviewers: djtodoro, aprantl, vsk

Reviewed By: djtodoro

Subscribers: llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D76279

a0a3a9c5

[SCCP] Precommit some additional tests for integer ranges. · 0db72442
Florian Hahn authored Mar 18, 2020

0db72442

[InstCombine][X86] simplifyX86varShift - convert variable in-range per-element... · f4e495a1

Simon Pilgrim authored Mar 18, 2020

[InstCombine][X86] simplifyX86varShift - convert variable in-range per-element shift amounts to generic shifts (PR40391)

AVX2/AVX512 per-element shifts can be replaced with generic shifts if the shift amounts are guaranteed to be in-range (upper bits are known zero).

f4e495a1

[ARM,MVE] Add intrinsics for the VQDMLAH family. · 928776de

Simon Tatham authored Mar 11, 2020

Summary:
These are complicated integer multiply+add instructions with extra
saturation, taking the high half of a double-width product, and
optional rounding. There's no sensible way to represent that in
standard IR, so I've converted the clang builtins directly to
target-specific intrinsics.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76123

928776de

[ARM,MVE] Add intrinsics and isel for MVE integer VMLA. · 28c5d97b

Simon Tatham authored Mar 11, 2020

Summary:
These instructions compute multiply+add in integers, with one of the
operands being a splat of a scalar. (VMLA and VMLAS differ in whether
the splat operand is a multiplier or the addend.)

I've represented these in IR using existing standard IR operations for
the unpredicated forms. The predicated forms are done with target-
specific intrinsics, as usual.

When operating on n-bit vector lanes, only the bottom n bits of the
i32 scalar operand are used. So we have to tell that to isel lowering,
to allow it to remove a pointless sign- or zero-extension instruction
on that input register. That's done in `PerformIntrinsicCombine`, but
first I had to enable `PerformIntrinsicCombine` for MVE targets
(previously all the intrinsics it handled were for NEON), and make it
a method of `ARMTargetLowering` so that it can get at
`SimplifyDemandedBits`.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76122

28c5d97b

Fix interaction with gold plugin · 8d019cda

serge-sans-paille authored Mar 18, 2020

Correctly set RelocationModel, thanks @modocache for spotting this.

Related to differential revision: https://reviews.llvm.org/D75579

8d019cda

[InstCombine][X86] Tests for variable but in-range per-element shift amounts (PR40391) · cda2b076
Simon Pilgrim authored Mar 18, 2020
```
These shifts are masked to be inrange so we should be able to replace them with generic shifts.
```
cda2b076

[SCCP] Use constant ranges for select, if cond is overdefined. · 5672ae8d

Florian Hahn authored Mar 18, 2020

For selects with an unknown condition, we can approximate the result by
merging the state of both options. This automatically takes care of
the case where on operand is undef.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71935

5672ae8d

[NFC][ARM] Add thumb triple to test · ef56b55e
Sam Parker authored Mar 18, 2020
```
Test the costs of selects for thumbv8m.base too.
```
ef56b55e

[Alignment][NFC] Deprecate getTransientStackAlignment · c3df69fa

Guillaume Chatelet authored Mar 17, 2020

Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jholewinski, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76301

c3df69fa

CET for Exception Handle · 974d649f

Pengfei Wang authored Mar 17, 2020

Summary:
Bug fix for https://bugs.llvm.org/show_bug.cgi?id=45182
Exception handle may indirectly jump to catch pad, So we should add ENDBR instruction before catch pad instructions.

Reviewers: craig.topper, hjl.tools, LuoYuanke, annita.zhang, pengfei

Reviewed By: LuoYuanke

Subscribers: hiraditya, llvm-commits

Patch By: Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D76190

974d649f

Revert "AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize" · 9bca8fc4
Vitaly Buka authored Mar 17, 2020
```
The patch introduced use-after-poison.

This reverts commit d0fe13ec.
```
9bca8fc4

[DAGCombine] Respect the uses when combine FMA for a*b+/-c*d · d577193c

QingShan Zhang authored Mar 18, 2020

If it is a*b-c*d, it could be also folded into fma(a, b, -c*d) or fma(-c, d, a*b).
This patch is trying to respect the uses of a*b and c*d to make the best choice.

Differential Revision: https://reviews.llvm.org/D75982

d577193c

Revert "Support repeated machine outlining" · 7b166d51
Jin Lin authored Mar 17, 2020
```
This reverts commit ab2dcff3.
```
7b166d51

Support repeated machine outlining · ab2dcff3

Jin Lin authored Mar 17, 2020

Summary: The following change is to allow the machine outlining can be applied for Nth times, where N is specified by the compiler option. By default the value of N is 1. The motivation is that the repeated machine outlining can further reduce code size. Please refer to the presentation "Improving Swift Binary Size via Link Time Optimization" in LLVM Developers' Meeting in 2019.

Reviewers: aschwaighofer, tellenbach, paquette

Reviewed By: paquette

Subscribers: tellenbach, hiraditya, llvm-commits, jinlin

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71027

ab2dcff3

Revert "Avoid emitting unreachable SP adjustments after `throw`" · 4e0fe038

Nico Weber authored Mar 17, 2020

This reverts commit 65b21282.
Breaks sanitizer bots (https://reviews.llvm.org/D75712#1927668)
and causes https://crbug.com/1062021 (which may or may not
be a compiler bug, not clear yet).

4e0fe038