Commits · 04c184bba7d7f3827dd12cdbdd734f8aabb99e86 · Lorenzo Albano / LLVM bpEVL

Oct 22, 2021

[TargetLowering] Simplify the interface of expandABS. NFC · 04c184bb

Craig Topper authored Oct 22, 2021

Instead of returning a bool to indicate success and a separate
SDValue, return the SDValue and have the callers check if it is
null.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112331

04c184bb

[Loads] Use more powerful constant folding API · 3a10fe2d

Nikita Popov authored Oct 21, 2021

This follows up on D111023 by exporting the generic "load value
from constant at given offset as given type" and using it in the
store to load forwarding code. We now need to make sure that the
load size is smaller than the store size, previously this was
implicitly ensured by ConstantFoldLoadThroughBitcast().

Differential Revision: https://reviews.llvm.org/D112260

3a10fe2d

[Attributor] Generalize GEP construction · 5bb75629

Nikita Popov authored Oct 21, 2021

Make use of the getGEPIndicesForOffset() helper for creating GEPs.
This handles arrays as well, uses correct GEP index types and
reduces code duplication.

Differential Revision: https://reviews.llvm.org/D112263

5bb75629

[LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if... · 0766aef3

Craig Topper authored Oct 22, 2021

[LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if they'll be expanded later.

Expanding these requires multiple constants. If we promote during type
legalization when they'll end up getting expanded in LegalizeDAG, we'll
use larger constants. These constants may be harder to materialize.
For example, 64-bit constants on 64-bit RISCV are very expensive.

This is similar to what has already been done to BSWAP and BITREVERSE.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112268

0766aef3

[Target, Transforms] Use StringRef::contains (NFC) · 6fe949c4
Kazu Hirata authored Oct 22, 2021

6fe949c4

[SystemZ] Give the EXRL_Pseudo a size value of 6 bytes. · 12b44bf5

Jonas Paulsson authored Oct 22, 2021

This pseudo is expanded very late (AsmPrinter) and therefore has to have a
correct size value, or the branch relaxation pass may make a wrong decision.

Review: Ulrich Weigand

12b44bf5

[AArch64][SVE] Add new ld<n> intrinsics that return a struct of vscale types · cfe22cd4

Bradley Smith authored Oct 18, 2021

This will allow us to reuse existing interleaved load logic in
lowerInterleavedLoad that exists for neon types, but for SVE fixed
types.

The goal eventually will be to replace the existing ld<n> intriniscs
with these, once a migration path has been sorted out.

Differential Revision: https://reviews.llvm.org/D112078

cfe22cd4

[clang/llvm] Inclusive language: replace segregate with separate · 0bd6a9f2
Zarko Todorovski authored Oct 22, 2021

0bd6a9f2

[X86] `X86TTIImpl::getInterleavedMemoryOpCost()`: scale interleaving cost by... · 8fac9e95

Roman Lebedev authored Oct 22, 2021

[X86] `X86TTIImpl::getInterleavedMemoryOpCost()`: scale interleaving cost by the fraction of live members

By definition, interleaving load of stride N means:
load N*VF elements, and shuffle them into N VF-sized vectors,
with 0'th vector containing elements `[0, VF)*stride + 0`,
and 1'th vector containing elements `[0, VF)*stride + 1`.
Example: https://godbolt.org/z/df561Me5E (i64 stride 4 vf 2 => cost 6)

Now, not fully interleaved load, is when not all of these vectors is demanded.
So at worst, we could just pretend that everything is demanded,
and discard the non-demanded vectors. What this means is that the cost
for not-fully-interleaved group should be not greater than the cost
for the same fully-interleaved group, but perhaps somewhat less.
Examples:
https://godbolt.org/z/a78dK5Geq (i64 stride 4 (indices 012u) vf 2 => cost 4)
https://godbolt.org/z/G91ceo8dM (i64 stride 4 (indices 01uu) vf 2 => cost 2)
https://godbolt.org/z/5joYob9rx (i64 stride 4 (indices 0uuu) vf 2 => cost 1)

Right now, for such not-fully-interleaved loads we just use the costs
for fully-interleaved loads. But at least **in general**,
that is obviously overly pessimistic, because **in general**,
not all the shuffles needed to perform the full interleaving
will end up being live.

So what this does, is naively scales the interleaving cost
by the fraction of the live members. I believe this should still result
in the right ballpark cost estimate, although it may be over/under -estimate.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112307

8fac9e95

[AMDGPU] Preserve deadness of vcc when shrinking instructions · 74cd4dee

Jay Foad authored Oct 22, 2021

This doesn't have any effect on codegen now, but it might do in the
future if we shrink instructions before post-RA scheduling, which is
sensitive to live vs dead defs.

Differential Revision: https://reviews.llvm.org/D112305

74cd4dee

AMDGPULibCalls - constify some FuncInfo& arguments. NFCI. · a750332d
Simon Pilgrim authored Oct 22, 2021

a750332d

AMDGPULibCalls::parseFunctionName - use reference instead of pointer. NFCI. · 99a64cc9

Simon Pilgrim authored Oct 22, 2021

parseFunctionName allowed a default null pointer, despite it being dereferenced immediately to be used as a reference and that all callers were taking the address of an existing reference.

Fixes static analyzer warning about potential dereferenced nulls

99a64cc9

[llvm] [ADT] Update llvm::Split() per Pavel Labath's suggestions · 66e06cc8

Michał Górny authored Sep 27, 2021

Optimize the iterator comparison logic to compare Current.data()
pointers.  Use std::tie for assignments from std::pair.  Replace
the custom class with a function returning iterator_range.

Differential Revision: https://reviews.llvm.org/D110535

66e06cc8

[LLVM-C]Add LLVMAddMetadataToInst, deprecated LLVMSetInstDebugLocation. · d4653156

Florian Hahn authored Oct 22, 2021

IRBuilder has been updated to support preserving metdata in a more
general manner. This patch adds `LLVMAddMetadataToInst` and
deprecates `LLVMSetInstDebugLocation` in favor of the more
general function.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D93454

d4653156

[RISCV] Fix missing cross-block VSETVLI insertion · 74c6895b

Fraser Cormack authored Oct 21, 2021

This patch fixes a codegen bug, the test for which was introduced in
D112223.

When merging VSETVLIInfo across blocks, if the 'exit' VSETVLIInfo
produced by a block is found to be compatible with the VSETVLIInfo
computed as the intersection of the 'exit' VSETVLIInfo produced by the
block's predecessors, that blocks' 'exit' info is discarded and the
intersected value is taken in its place.

However, we have one authority on what constitutes VSETVLIInfo
compatibility and we are using it in two different contexts.

Compatibility is used in one context to elide VSETVLIs between
straight-line vector instructions. But compatibility when evaluated
between two blocks' exit infos ignores any info produced *inside* each
respective block before the exit points. As such it does not guarantee
that a block will not produce a VSETVLI which is incompatible with the
'previous' block.

As such, we must ensure that any merging of VSETVLIInfo is performed
using some notion of "strict" compatibility. I've defined this as a full
vtype match, but this is perhaps too pessimistic. Given that test
coverage in this regard is lacking -- the only change is in the failing
test -- I think this is a good starting point.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D112228

74c6895b

[PowerPC] iterate on the SmallSet directly; NFC · 86a5c326
Chen Zheng authored Oct 22, 2021

86a5c326
[PowerPC] return early if there is no preparing candidate in the loop; NFC · 13755436
Chen Zheng authored Oct 22, 2021
```
This is to improve compiling time.

Differential Revision: https://reviews.llvm.org/D112196

Reviewed By: jsji
```
13755436

[Coroutines] Ignore partial lifetime markers refer of an alloca · ddbf1961

Chuanqi Xu authored Oct 22, 2021

When I playing with Coroutines, I found that it is possible to generate
following IR:
```
%struct = alloca ...
%sub.element = getelementptr %struct, i64 0, i64 index ; index is not
%zero
lifetime.marker.start(%sub.element)
% use of %sub.element
lifetime.marker.end(%sub.element)
store %struct to xxx ;  %struct is escaping!

<suspend points>
```

Then the AllocaUseVisitor would collect the lifetime marker for
sub.element and treat it as the lifetime markers of the alloca! So it
judges that the alloca could be put on the stack instead of the frame by
judging the lifetime markers only.
The root cause for the bug is that AllocaUseVisitor collects wrong
lifetime markers.

This patch fixes this.

Reviewed By: lxfind

Differential Revision: https://reviews.llvm.org/D112216

ddbf1961

[msan] Don't use TLS slots of noundef args · b7ea298d

Vitaly Buka authored Oct 20, 2021

Transformations may strip the attribute from the
argument, e.g. for unused, which will result in
shadow offsets mismatch between caller and
callee.

Stripping noundef for used arguments can be
a problem, as TLS is not going to be set
by caller. However this is not the goal of the
patch and I am not aware if that's even
possible.

Differential Revision: https://reviews.llvm.org/D112197

b7ea298d

[AMDGPU] Allow to use a whole register file on gfx90a for VGPRs · ca0c92d6

Stanislav Mekhanoshin authored Oct 13, 2021

In a kernel which does not have calls or AGPR usage we can allocate
the whole vector register budget for VGPRs and have no AGPRs as
long as VGPRs stay addressable (i.e. below 256).

Differential Revision: https://reviews.llvm.org/D111764

ca0c92d6

[Demangle] Rename OutputStream to OutputString · 2e97236a

Luís Ferreira authored Oct 21, 2021

This patch is a refactor to implement prepend afterwards. Since this changes a lot of files and to conform with guidelines, I will separate this from the implementation of prepend. Related to the discussion in https://reviews.llvm.org/D111414 , so please read it for more context.

Reviewed By: #libc_abi, dblaikie, ldionne

Differential Revision: https://reviews.llvm.org/D111947

2e97236a

[DebugInfo] Expand ability to load 2-byte addresses in dwarf sections · d7733f84

Jack Anderson authored Oct 21, 2021

Some dwarf loaders in LLVM are hard-coded to only accept 4-byte and 8-byte address sizes. This patch generalizes acceptance into `DWARFContext::isAddressSizeSupported` and provides a common way to generate rejection errors.

The MSP430 target has been given new tests to cover dwarf loading cases that previously failed due to 2-byte addresses.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D111953

d7733f84

[TargetLowering] Simplify the interface for expandCTPOP/expandCTLZ/expandCTTZ. · 996123e5

Craig Topper authored Oct 21, 2021

There is no need to return a bool and have an SDValue output
parameter. Just return the SDValue and let the caller check if it
is null.

I have another patch to add more callers of these so I thought
I'd clean up the interface first.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112267

996123e5

[LegalizeVectorOps][X86] Don't defer BITREVERSE expansion to LegalizeDAG. · ff37b110

Craig Topper authored Oct 21, 2021

By expanding early it allows the shifts to be custom lowered in
LegalizeVectorOps. Then a DAG combine is able to run on them before
LegalizeDAG handles the BUILD_VECTORS for the masks used.

v16Xi8 shift lowering on X86 requires a mask to be applied to a v8i16
shift. The BITREVERSE expansion applied an AND mask before SHL ops and
after SRL ops. This was done to share the same mask constant for both shifts.
It looks like this patch allows DAG combine to remove the AND mask added
after v16i8 SHL by X86 lowering. This maintains the mask sharing that
BITREVERSE was trying to achieve. Prior to this patch it looks like
we kept the mask after the SHL instead which required an extra constant
pool or a PANDN to invert it.

This is dependent on D112248 because RISCV will end up scalarizing the BSWAP
portion of the BITREVERSE expansion if we don't disable BSWAP scalarization in
LegalizeVectorOps first.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112254

ff37b110

Oct 21, 2021

[TargetLowering][RISCV] Prevent scalarization of fixed vector bswap. · 458ed5fc

Craig Topper authored Oct 21, 2021

It's better to do the ands, shifts, ors in the vector domain than
to scalarize it and do those operations on each element.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112248

458ed5fc

[AArch64][GlobalISel] Fold 64-bit cmps with 64-bit adds · 5dc339d9

Jessica Paquette authored Oct 04, 2021

G_ICMP is selected to an arithmetic overflow op (ADDS/SUBS/etc) with a dead
destination + a CSINC instruction.

We have a fold which allows us to combine 32-bit adds with G_ICMP.

The problem with G_ICMP is that we model it as always having a 32-bit
destination even though it can be a 64-bit operation. So, we were missing some
opportunities for 64-bit folds.

This patch teaches the fold to recognize 64-bit G_ICMPs + refactors some of
the code surrounding CSINC accordingly.

(Later down the line, I think we should probably change the way we handle G_ICMP
in general.)

Differential Revision: https://reviews.llvm.org/D111088

5dc339d9

BPF: emit BTF_KIND_DECL_TAG for typedef types · 0472e83f

Yonghong Song authored Sep 20, 2021

If a typedef type has __attribute__((btf_decl_tag("str"))) with
bpf target, emit BTF_KIND_DECL_TAG for that type in the BTF.

Differential Revision: https://reviews.llvm.org/D112259

0472e83f

[CodeMetrics] Don't require speculatability for ephemeral values · 18485258

Nikita Popov authored Oct 19, 2021

As discussed in D112016, our current requirement of speculatability
for ephemeral is overly strict: What we really care about is that
the instruction will be DCEd once the assume is dropped. For that
it is sufficient that the instruction is side-effect free and not
a terminator.

In particular, this allows non-dereferenceable loads to be ephemeral
values.

Differential Revision: https://reviews.llvm.org/D112179

18485258

[RISCV] Expand scalable vector CTTZ/CTLZ/CTPOP. · d55be79d
Craig Topper authored Oct 21, 2021
```
Differential Revision: https://reviews.llvm.org/D112233
```
d55be79d
Revert "[IPT] Restructure cache to allow lazy update following invalidation [NFC]" · 3781a46c
Arthur Eubanks authored Oct 21, 2021
```
This reverts commit baea663a.

Causes crashes, e.g. https://lab.llvm.org/buildbot/#/builders/77/builds/10715.
```
3781a46c

[VectorCombine] fold shuffle-of-binops with common operand · 66d22b4d

Sanjay Patel authored Oct 21, 2021

shuf (bo X, Y), (bo X, W) --> bo (shuf X), (shuf Y, W)

This is motivated by an example in D111800
(although that patch avoids the problem for that particular example).

The pattern is shown in reduced form with:
https://llvm.org/PR52178
https://alive2.llvm.org/ce/z/d8zB4D

There is no difference on the PhaseOrdering test from D111800
because the aarch64 cost model says that the shuffle cost is 3 while
the fadd cost is 2.

Differential Revision: https://reviews.llvm.org/D111901

66d22b4d

[IPT] Restructure cache to allow lazy update following invalidation [NFC] · baea663a

Philip Reames authored Oct 21, 2021

This change restructures the cache used in IPT to point not to the first special instruction, but to the first instruction which *could* be special. That is, the cached reference is always equal to the first special, or comes before it in the block.

This avoids expensive block scans when we are removing special instructions from the beginning of the block. At the moment, this case is not heavily used, though it does trigger in GVN when doing CSE of calls. The main motivation was a change I'm no longer planning to move forward with, but the cache optimization seemed worthwhile as a minor perf win at low cost.

Differential Revision: https://reviews.llvm.org/D111768

baea663a

[DebugInfo] Support typedef with btf_decl_tag attributes · f6811cec

Yonghong Song authored Sep 20, 2021

Clang patch ([1]) added support for btf_decl_tag attributes with typedef
types. This patch added llvm support including dwarf generation.
For example, for typedef
   typedef unsigned * __u __attribute__((btf_decl_tag("tag1")));
   __u u;
the following shows llvm-dwarfdump result:
   0x00000033:   DW_TAG_typedef
                   DW_AT_type      (0x00000048 "unsigned int *")
                   DW_AT_name      ("__u")
                   DW_AT_decl_file ("/home/yhs/work/tests/llvm/btf_tag/t.c")
                   DW_AT_decl_line (1)

   0x0000003e:     DW_TAG_LLVM_annotation
                     DW_AT_name    ("btf_decl_tag")
                     DW_AT_const_value     ("tag1")

   0x00000047:     NULL

  [1] https://reviews.llvm.org/D110127

Differential Revision: https://reviews.llvm.org/D110129

f6811cec

[InstCombine] generalize reassociated Demorgan folds · 3888de95

Sanjay Patel authored Oct 21, 2021

This updates the recent D112108 / b92412fb
to handle the flipped logic ('or') sibling:
https://alive2.llvm.org/ce/z/Y2L6Ch

3888de95

[SystemZ][z/OS] Initial implementation for lowerCall on z/OS · aa3519f1

Anirudh Prasad authored Oct 21, 2021

- This patch provides the initial implementation for lowering a call on z/OS according to the XPLINK64 calling convention
- A series of changes have been made to SystemZCallingConv.td to account for these additional XPLINK64 changes including adding a new helper function to shadow the stack along with allocation of a register wherever appropriate
- For the cases of copying a f64 to a gr64 and a f128 / 128-bit vector type to a gr64, a `CCBitConvertToType` has been added and has been bitcasted appropriately in the lowering phase
- Support for the ADA register (R5) will be provided in a later patch.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D111662

aa3519f1

[DAGCombiner] fold bit-hack form of usubsat · d2198771

Sanjay Patel authored Oct 21, 2021

(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

I haven't found a generalization of this identity:
https://alive2.llvm.org/ce/z/_sriEQ

Note: I was actually looking at the first form of the pattern in that link,
but that's part of a long chain of potential missed transforms in codegen
and IR....that I hope ends here!

The predicates for when this is profitable are a bit tricky. This version of
the patch excludes multi-use but includes custom lowering (as opposed to
legal only).

On x86 for example, we have custom lowering for some vector types, and that
uses umax and sub. So to enable that fold, we need add use checks to avoid
regressions. Even with legal-only lowering, we could see code with extra
reg move instructions for extra uses, so that constraint would have to be
eased very carefully to avoid penalties.

Differential Revision: https://reviews.llvm.org/D112085

d2198771

[SLP]Unify vectorization of PHI and store nodes with improved tiny tree vectorization. · 3ea7877c

Alexey Bataev authored Sep 15, 2021

Vectorization of PHIs and stores very similar, it might be beneficial to
try to revectorize stores (like PHIs) if the total number of stores with
the same/alternate opcode is less than the vector size but number of
stores with the same type is larger than the vector size.

Differential Revision: https://reviews.llvm.org/D109831

3ea7877c

[SVE] Fix selection failure when splitting extended masked loads · 0d153df6

Kerry McLaughlin authored Oct 21, 2021

When splitting a masked load, `GetDependentSplitDestVTs` is used to get the
MemVTs of the high and low parts. If the masked load is extended, this
may return VTs with different element types which are used to create the
high & low masked load instructions.
This patch changes `GetDependentSplitDestVTs` to ensure we return VTs with
the same element type.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D111996

0d153df6

[MIPS] Fix switching between 32/64-bit variants of r6 target triples · 302a165e

YunQiang Su authored Oct 21, 2021

If clang driver gets 64-bit r6 target triple like `mipsisa64r6` and
additional option forces switching to generation of 32-bit code, it
loses r6 abi and generates 32-bit r2-r5 abi code.

```
$ clang -target mipsisa64r6-linux-gnu -mabi=32
```

This patch fixes the problem.

- Add optional `SubArchType` argument to the `Triple::setArch()` method.
- Implement generation of mips r6 target triples in the
  `Triple::getArchName()` method.

Differential Revision: https://reviews.llvm.org/D110514.diff

302a165e

[NFC][LoopIdiom] Make for loops more readable · 9ba5bb43

Dawid Jurczak authored Oct 21, 2021

Patch simplifies for loops in LIR following LLVM guidelines: https://llvm.org/docs/CodingStandards.html#use-range-based-for-loops-wherever-possible.

Differential Revision: https://reviews.llvm.org/D112077

9ba5bb43