Commits · e6a74803d4ee59dce5eed24727c8f52bc9774e61 · Lorenzo Albano / LLVM bpEVL

Mar 18, 2020

[VPlan] Use underlying value for printing, if available. · e6a74803

Florian Hahn authored Mar 18, 2020

When the an underlying value is available, we can use its name for
printing, as discussed in D73078.

Reviewers: rengolin, hsaito, Ayal, gilr

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D76200

e6a74803

[ARM,MVE] Add intrinsics for the VQDMLAD family. · e13d153c

Simon Tatham authored Mar 18, 2020

Summary:
This is another set of instructions too complicated to be sensibly
expressed in IR by anything short of a target-specific intrinsic.
Given input vectors a,b, the instruction generates intermediate values
2*(a[0]*b[0]+a[1]+b[1]), 2*(a[2]*b[2]+a[3]+b[3]), etc; takes the high
half of each double-width values, and overwrites half the lanes in the
output vector c, which you therefore have to provide the input value
of. Optionally you can swap the elements of b so that the are things
like a[0]*b[1]+a[1]*b[0]; optionally you can round to nearest when
taking the high half; and optionally you can take the difference
rather than sum of the two products. Finally, saturation is applied
when converting back to a single-width vector lane.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76359

e13d153c

Revert "[Syntax] Build template declaration nodes" · 881f5b5a

Nico Weber authored Mar 18, 2020

This reverts commit dd128268.
Breaks tests on Windows, see https://reviews.llvm.org/D76346#1929208

881f5b5a

[libc] Adding memcpy implementation for x86_64 · 04a309dd

Guillaume Chatelet authored Feb 11, 2020

Summary:
The patch is not ready yet and is here to discuss a few options:
 - How do we customize the implementation? (i.e. how to define `kRepMovsBSize`),
 - How do we specify custom compilation flags? (We'd need `-fno-builtin-memcpy` to be passed in),
 - How do we build? We may want to test in debug but build the libc with `-march=native` for instance,
 - Clang has a brand new builtin `__builtin_memcpy_inline` which makes the implementation easy and efficient, but:
   - If we compile with `gcc` or `msvc` we can't use it, resorting on less efficient code generation,
   - With gcc we can use `__builtin_memcpy` but then we'd need a postprocess step to check that the final assembly do not contain call to `memcpy` (unlikely but allowed),
   - For msvc we'd need to resort on the compiler optimization passes.

Reviewers: sivachandra, abrachet

Subscribers: mgorny, MaskRay, tschuett, libc-commits, courbet

Tags: #libc-project

Differential Revision: https://reviews.llvm.org/D74397

04a309dd

[gn build] remove a workaround that is no longer needed · 642a424b
Nico Weber authored Mar 18, 2020

642a424b
[NFC][PowerPC] Update test · fc2a5ef9
Sam Parker authored Mar 18, 2020
```
Run the update script on one of the loop unroll tests.
```
fc2a5ef9

AMDGPU: Initial, crude support for indirect calls · 4ea1baf6

Matt Arsenault authored Mar 16, 2020

This isn't really usable, and requires using the
-amdgpu-fixed-function-abi flag to work.

Assumes a uniform call target, and will hit a verifier error if the
call target ends up in a VGPR. Also doesn't attempt to do anything
sensible for the reported register/stack usage.

4ea1baf6

Reapply "AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize" · ea4597ee

Matt Arsenault authored Mar 18, 2020

This reverts commit 9bca8fc4.

Rearrange handling to avoid changing the instruction in the case where
it's going to be erased and replaced with undef.

ea4597ee

[AMDGPU] Fix AMDGPUUnifyDivergentExitNodes · d1a7bfca

Piotr Sobczak authored Mar 18, 2020

Summary:
For the case where "done" bits on existing exports are removed
by unifyReturnBlockSet(), unify all return blocks - even the
uniformly reached ones. We do not want to end up with a non-unified,
uniformly reached block containing a normal export with the "done"
bit cleared.

That case is believed to be rare - possible with infinite loops
in pixel shaders.

This is a fix for D71192.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76364

d1a7bfca

[gn build] add rebase changes that should have been in 9f981e9a · f57290ec
Nico Weber authored Mar 18, 2020

f57290ec
[ValueTracking] Add computeKnownBits DemandedElts support to AND instructions (PR36319) · 06150e83
Simon Pilgrim authored Mar 18, 2020

06150e83
Reland "[gn build] (manually) port 8b409eab" · 9f981e9a
Nico Weber authored Mar 18, 2020
```
This reverts commit 4060016f
and re-merges c5b81466.
```
9f981e9a

[Syntax] Build template declaration nodes · dd128268

Marcel Hlopko authored Mar 18, 2020

Summary:
Copy of https://reviews.llvm.org/D72334, submitting with Ilya's permission.

Handles template declaration of all kinds.

Also builds template declaration nodes for specializations and explicit
instantiations of classes.

Some missing things will be addressed in the follow-up patches:

specializations of functions and variables,
template parameters.

Reviewers: gribozavr2

Reviewed By: gribozavr2

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76346

dd128268

[InstCombine] GEPOperator::accumulateConstantOffset does not support scalable vectors · ef64ba83

Sander de Smalen authored Mar 18, 2020

Avoid transforming:

 %0 = bitcast i8* %base to <vscale x 16 x i8>*
 %1 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %0, i64 1

into:

 %0 = getelementptr i8, i8* %base, i64 16
 %1 = bitcast i8* %0 to <vscale x 16 x i8>*

Reviewers: efriedma, ctetreau

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76236

ef64ba83

[PowerPC][AIX] Implement by-val caller arguments in a single register. · c2186647

Chris Bowler authored Mar 18, 2020

This is the first of a series of patches that adds caller support for
by-value arguments. This patch add support for arguments that are passed in a
single GPR.

There are 3 limitation cases:
-The by-value argument is larger than a single register.
-There are no remaining GPRs even though the by-value argument would
otherwise fit in a single GPR.
-The by-value argument requires alignment greater than register width.

Future patches will be required to add support for these cases as well
as for the callee handling (in LowerFormalArguments_AIX) that
corresponds to this work.

Differential Revision: https://reviews.llvm.org/D75863

c2186647

[lldb] [testsuite] Enable forgotten -gsplit-dwarf for 2 testfiles · 3481062b

Jan Kratochvil authored Mar 18, 2020

D63643 added these testfiles but some of the %t4dwo and %t5dwo builds
are the same as corresponding %t4 and %t5 builds. Fortunately the
testcases do PASS.

After just adding -gsplit-dwarf these both skeleton files:
  tools/lldb/test/SymbolFile/DWARF/Output/debug-types-expressions.test.tmp4dwo
  tools/lldb/test/SymbolFile/DWARF/Output/debug-types-expressions.test.tmp5dwo

were referencing to this one non-skeleton file:
  tools/lldb/test/SymbolFile/DWARF/debug-types-expressions.dwo

Surprisingly it does not affect the other test debug-types-basic.test
probably because it compiles to .o and then links it. While
debug-types-expressions.test compiles directly to an executable.

So fixed that while keeping the direct executable compilation.

Differential Revision: https://reviews.llvm.org/D76316

3481062b

[InstCombine][X86] Add additional demandedelts style test for in-range... · 24c2e613

Simon Pilgrim authored Mar 18, 2020

[InstCombine][X86] Add additional demandedelts style test for in-range variable per-element shift amounts (PR40391)

If we've shuffled the shift amount some of the (undemanded) elements may have become undef - this should be handled by the missing support in PR36319.

24c2e613

Fix `warning: extra ‘;’` (NFC) · 4d506da9
Mehdi Amini authored Mar 18, 2020

4d506da9

Fix build with gcc 7.5 by adding a "redundant move" · f3e297d9

Mehdi Amini authored Mar 18, 2020

The constructor of Expected<T> expects as T&&, but gcc-7.5 does not
infer an rvalue in this context apparently.

f3e297d9

[NFCI][SCEV] Avoid recursion in SCEVExpander::isHighCostExpansion*() · 85334b03

Roman Lebedev authored Mar 18, 2020

Summary:
As noted in [[ https://bugs.llvm.org/show_bug.cgi?id=45201 | PR45201 ]],
[[ https://bugs.llvm.org/show_bug.cgi?id=10090 | PR10090 ]] SCEV doesn't
always avoid recursive algorithms, and that causes issues with
large expression depths and/or smaller stack sizes.

In `SCEVExpander::isHighCostExpansion*()` case, the refactoring to avoid
recursion is rather idiomatic. We simply need to place the root expr
into a vector, and iterate over vector elements accounting for the cost
of each one, adding new exprs at the end of the vector,
thus achieving recursion-less traversal.

The order in which we will visit exprs doesn't matter here,
so we will be fine with the most basic approach of using SmallVector
and inserting/extracting from the back, which accidentally is the same
depth-first traversal that we were doing previously recursively.

Reviewers: mkazantsev, reames, wmi, ekatz

Reviewed By: mkazantsev

Subscribers: hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76273

85334b03

[IPRA][ARM] Spill extra registers at -Oz · 73cea83a

Oliver Stannard authored Jul 18, 2019

When optimising for code size at the expense of performance, it is often
worth saving and restoring some of r0-r3, if IPRA will be able to take
advantage of them. This doesn't cost any extra code size if we already
have a PUSH/POP pair, and increases the number of available registers
across any calls to the function.

We already have an optimisation which tries fold the subtract/add of the
SP into the PUSH/POP by using extra registers, which somewhat conflicts
with this. I've made the new optimisation less aggressive in cases where
the existing one is likely to trigger, which gives better results than
either of these optimisations by themselves.

Differential revision: https://reviews.llvm.org/D69936

73cea83a

[Alignment][NFC] Deprecate getMaxAlignment · d000655a

Guillaume Chatelet authored Mar 18, 2020

Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jholewinski, arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76348

d000655a

[NFC][PowerPC] Add a new MIR file to test if-converter pass · 96b70809
Kang Zhang authored Mar 18, 2020

96b70809
[NFC] Add missing REQUIRES clause to a test · 2aaafaf5
Danila Malyutin authored Mar 18, 2020

2aaafaf5

[hip] Revise `GlobalDecl` constructors. NFC. · 4cf01ed7

Michael Liao authored Mar 18, 2020

Summary:
- https://reviews.llvm.org/D68578 revises the `GlobalDecl` constructors
  to ensure all GPU kernels have `ReferenceKenelKind` initialized
  properly with an explicit constructor and static one. But, there are
  lots of places using the implicit constructor triggering the assertion
  on non-GPU kernels. That's found in compilation of many tests and
  workloads.
- Fixing all of them may change more code and, more importantly, all of
  them assumes the default kernel reference kind. This patch changes
  that constructor to tell `CUDAGlobalAttr` and construct `GlobalDecl`
  properly.

Reviewers: yaxunl

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76344

4cf01ed7

[ARM] Track epilogue instructions with FrameDestroy flag (NFC) · 6739805e

Oliver Stannard authored Jul 18, 2019

Rather than trying to work out which instructions are part of the
epilogue by examining them, we can just mark them with the FrameDestroy
flag, like we do in the AArch64 backend.

6739805e

[mlir] NFC: Fix trivial typos in documents · a8901a03

Kazuaki Ishizaki authored Mar 18, 2020

Fix trivial typos

Reviewers: mravishankar, antiagainst, ftynse

Reviewed By: ftynse

Subscribers: ftynse, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, bader, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76347

a8901a03

[lldb/Target] Support more than 2 symbols in StackFrameRecognizer · db31e2e1

Med Ismail Bennani authored Mar 13, 2020

This patch changes the way the StackFrame Recognizers match a certain
frame.

Until now, recognizers could be registered with a function
name but also an alternate symbol.
This change is motivated by a test failure for the Assert frame
recognizer on Linux. Depending the version of the libc, the abort
function (triggered by an assertion), could have more than two
signatures (i.e. `raise`, `__GI_raise` and `gsignal`).

Instead of only checking the default symbol name and the alternate one,
lldb will iterate over a list of symbols to match against.

rdar://60386577

Differential Revision: https://reviews.llvm.org/D76188



Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>

db31e2e1

[OPENMP50]Codegen for detach clause. · b09cce07
Alexey Bataev authored Mar 18, 2020
```
Implemented codegen for detach clause in task directives.
```
b09cce07

[llvm][SVE] Addressing mode for FF/NF loads. · 9bdcd9bf

Francesco Petrogalli authored Mar 09, 2020

Summary:
This patch adds addressing mode computation for the following SVE
instructions:

* ldff1{s}<T1> { <Zt>.<T2> }, <Pg>/Z, [<Xn|SP>{, <Xm>{, lsl #imm}}]
* ldnf1{s}<T1> { <Zt>.<T2> }, <Pg>/Z, [<Xn|SP>{, #<imm>, mul vl}]

Reviewers: andwar, sdesmalen, rengolin, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76209

9bdcd9bf

[AArch64][SVE] Change pointer type of nontemporal load/store intrinsics · 4788ca45

Sander de Smalen authored Mar 18, 2020

Summary:
This fixes a discrepancy between the non-temporal loads/store
intrinsics and other SVE load intrinsics (such as nf/ff), so
that Clang can use the same code to generate these intrinsics.

Reviewers: andwar, kmclaughlin, rengolin, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76237

4788ca45

Fix possible assertion when using PBQP with debug info · 940ba146

Danila Malyutin authored Mar 17, 2020

Skip debug instructions before calling functions not expecting them.
In particular, LIS.getInstructionIndex(*mi) would fail if mi was a debg instr.

Differential Revision: https://reviews.llvm.org/D76129

940ba146

[DebugInfo] Fix multi-byte entry values in call site values · a0a3a9c5

David Stenberg authored Mar 18, 2020

Summary:
In D67768/D67492 I added support for entry values having blocks larger
than one byte, but I now noticed that the DIE implementation I added there
was broken. The takeNodes() function, that moves the entry value block
from a temporary buffer to the output buffer, would destroy the input
iterator when transferring the first node, meaning that only that node
was moved.

In practice, this meant that when emitting a call site value using a
DW_OP_entry_value operation with a DWARF register number larger than 31,
that multi-byte DW_OP_regx expression would be truncated.

Reviewers: djtodoro, aprantl, vsk

Reviewed By: djtodoro

Subscribers: llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D76279

a0a3a9c5

[SCCP] Precommit some additional tests for integer ranges. · 0db72442
Florian Hahn authored Mar 18, 2020

0db72442

[InstCombine][X86] simplifyX86varShift - convert variable in-range per-element... · f4e495a1

Simon Pilgrim authored Mar 18, 2020

[InstCombine][X86] simplifyX86varShift - convert variable in-range per-element shift amounts to generic shifts (PR40391)

AVX2/AVX512 per-element shifts can be replaced with generic shifts if the shift amounts are guaranteed to be in-range (upper bits are known zero).

f4e495a1

Reland D75470 [SVE] Auto-generate builtins and header for svld1. · c5b81466

Sander de Smalen authored Mar 18, 2020

Reworked the patch to avoid sharing a header (SVETypeFlags.h) between
include/clang/Basic and utils/TableGen/SveEmitter.cpp. Now the patch
generates the enum/flags which is included in TargetBuiltins.h.

Also renamed one of the SveEmitter options to be in line with MVE.

Summary:

This is a first patch in a series for the SveEmitter to generate the arm_sve.h
header file and builtins.

I've tried my best to strip down this patch as best as I could, but there
are still a few changes that are not necessarily exercised by the load intrinsics
in this patch, mostly around the SVEType class which has some common logic to
represent types from a type and prototype string. I thought it didn't make
much sense to remove that from this patch and split it up.

c5b81466

[ARM,MVE] Add intrinsics for the VQDMLAH family. · 928776de

Simon Tatham authored Mar 11, 2020

Summary:
These are complicated integer multiply+add instructions with extra
saturation, taking the high half of a double-width product, and
optional rounding. There's no sensible way to represent that in
standard IR, so I've converted the clang builtins directly to
target-specific intrinsics.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76123

928776de

[ARM,MVE] Add intrinsics and isel for MVE integer VMLA. · 28c5d97b

Simon Tatham authored Mar 11, 2020

Summary:
These instructions compute multiply+add in integers, with one of the
operands being a splat of a scalar. (VMLA and VMLAS differ in whether
the splat operand is a multiplier or the addend.)

I've represented these in IR using existing standard IR operations for
the unpredicated forms. The predicated forms are done with target-
specific intrinsics, as usual.

When operating on n-bit vector lanes, only the bottom n bits of the
i32 scalar operand are used. So we have to tell that to isel lowering,
to allow it to remove a pointless sign- or zero-extension instruction
on that input register. That's done in `PerformIntrinsicCombine`, but
first I had to enable `PerformIntrinsicCombine` for MVE targets
(previously all the intrinsics it handled were for NEON), and make it
a method of `ARMTargetLowering` so that it can get at
`SimplifyDemandedBits`.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76122

28c5d97b

Fix interaction with gold plugin · 8d019cda

serge-sans-paille authored Mar 18, 2020

Correctly set RelocationModel, thanks @modocache for spotting this.

Related to differential revision: https://reviews.llvm.org/D75579

8d019cda

[InstCombine][X86] Tests for variable but in-range per-element shift amounts (PR40391) · cda2b076
Simon Pilgrim authored Mar 18, 2020
```
These shifts are masked to be inrange so we should be able to replace them with generic shifts.
```
cda2b076