Commits · 04c184bba7d7f3827dd12cdbdd734f8aabb99e86 · Lorenzo Albano / LLVM bpEVL

Oct 22, 2021

[TargetLowering] Simplify the interface of expandABS. NFC · 04c184bb

Craig Topper authored Oct 22, 2021

Instead of returning a bool to indicate success and a separate
SDValue, return the SDValue and have the callers check if it is
null.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112331

04c184bb

[llvm]Inclusive language: replace master with main · 950f22a5
Quinn Pham authored Oct 22, 2021
```
[NFC] This patch fixes a url in a testcase due to the renaming of the branch.
```
950f22a5

[Loads] Use more powerful constant folding API · 3a10fe2d

Nikita Popov authored Oct 21, 2021

This follows up on D111023 by exporting the generic "load value
from constant at given offset as given type" and using it in the
store to load forwarding code. We now need to make sure that the
load size is smaller than the store size, previously this was
implicitly ensured by ConstantFoldLoadThroughBitcast().

Differential Revision: https://reviews.llvm.org/D112260

3a10fe2d

[Attributor] Generalize GEP construction · 5bb75629

Nikita Popov authored Oct 21, 2021

Make use of the getGEPIndicesForOffset() helper for creating GEPs.
This handles arrays as well, uses correct GEP index types and
reduces code duplication.

Differential Revision: https://reviews.llvm.org/D112263

5bb75629

[LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if... · 0766aef3

Craig Topper authored Oct 22, 2021

[LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if they'll be expanded later.

Expanding these requires multiple constants. If we promote during type
legalization when they'll end up getting expanded in LegalizeDAG, we'll
use larger constants. These constants may be harder to materialize.
For example, 64-bit constants on 64-bit RISCV are very expensive.

This is similar to what has already been done to BSWAP and BITREVERSE.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112268

0766aef3

[AIX] Enable rtl for plugins test · 28ef8052

Steven Wan authored Oct 22, 2021

On AIX, the plugins are linked with `-WL,-G`, which produces shared objects enabled for use with the run-time linker. This patch sets the run-time
linker at the main executable link step to allow symbols from the plugins shared objects to be properly bound.

Reviewed By: daltenty

Differential Revision: https://reviews.llvm.org/D112275

28ef8052

[RISCV] Merge vector tests for rv32 and rv64 into a single test file · ce7b8343

Craig Topper authored Oct 21, 2021

These tests have nearly identical content the only difference is
that the rv64 test has a signext attribute on some parameters.
That attribute should be harmless on rv32.

Merge them into a single test file with 2 RUN lines.

Differential Revision: https://reviews.llvm.org/D112242

ce7b8343

[Target, Transforms] Use StringRef::contains (NFC) · 6fe949c4
Kazu Hirata authored Oct 22, 2021

6fe949c4

[SystemZ] Give the EXRL_Pseudo a size value of 6 bytes. · 12b44bf5

Jonas Paulsson authored Oct 22, 2021

This pseudo is expanded very late (AsmPrinter) and therefore has to have a
correct size value, or the branch relaxation pass may make a wrong decision.

Review: Ulrich Weigand

12b44bf5

[InstCombine][NFC] Precommit new tests · 7457fe3d
Piotr Sobczak authored Oct 22, 2021

7457fe3d

[AArch64][SVE] Add new ld<n> intrinsics that return a struct of vscale types · cfe22cd4

Bradley Smith authored Oct 18, 2021

This will allow us to reuse existing interleaved load logic in
lowerInterleavedLoad that exists for neon types, but for SVE fixed
types.

The goal eventually will be to replace the existing ld<n> intriniscs
with these, once a migration path has been sorted out.

Differential Revision: https://reviews.llvm.org/D112078

cfe22cd4

[clang/llvm] Inclusive language: replace segregate with separate · 0bd6a9f2
Zarko Todorovski authored Oct 22, 2021

0bd6a9f2

[X86] `X86TTIImpl::getInterleavedMemoryOpCost()`: scale interleaving cost by... · 8fac9e95

Roman Lebedev authored Oct 22, 2021

[X86] `X86TTIImpl::getInterleavedMemoryOpCost()`: scale interleaving cost by the fraction of live members

By definition, interleaving load of stride N means:
load N*VF elements, and shuffle them into N VF-sized vectors,
with 0'th vector containing elements `[0, VF)*stride + 0`,
and 1'th vector containing elements `[0, VF)*stride + 1`.
Example: https://godbolt.org/z/df561Me5E (i64 stride 4 vf 2 => cost 6)

Now, not fully interleaved load, is when not all of these vectors is demanded.
So at worst, we could just pretend that everything is demanded,
and discard the non-demanded vectors. What this means is that the cost
for not-fully-interleaved group should be not greater than the cost
for the same fully-interleaved group, but perhaps somewhat less.
Examples:
https://godbolt.org/z/a78dK5Geq (i64 stride 4 (indices 012u) vf 2 => cost 4)
https://godbolt.org/z/G91ceo8dM (i64 stride 4 (indices 01uu) vf 2 => cost 2)
https://godbolt.org/z/5joYob9rx (i64 stride 4 (indices 0uuu) vf 2 => cost 1)

Right now, for such not-fully-interleaved loads we just use the costs
for fully-interleaved loads. But at least **in general**,
that is obviously overly pessimistic, because **in general**,
not all the shuffles needed to perform the full interleaving
will end up being live.

So what this does, is naively scales the interleaving cost
by the fraction of the live members. I believe this should still result
in the right ballpark cost estimate, although it may be over/under -estimate.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112307

8fac9e95

Replace references to Makefile.sphinx · fd5e3f36

Sylvestre Ledru authored Oct 22, 2021

and fix some typos

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D112299

fd5e3f36

[CodeGen] Add PR50197 AArch64/ARM/X86 test coverage · d8e50c9d
Simon Pilgrim authored Oct 22, 2021
```
Pre-commit for D111530
```
d8e50c9d

[AMDGPU] Preserve deadness of vcc when shrinking instructions · 74cd4dee

Jay Foad authored Oct 22, 2021

This doesn't have any effect on codegen now, but it might do in the
future if we shrink instructions before post-RA scheduling, which is
sensitive to live vs dead defs.

Differential Revision: https://reviews.llvm.org/D112305

74cd4dee

[DSE] Add test cases with more complex redundant stores. · 286e98b9

Florian Hahn authored Oct 22, 2021

This patch adds more complex test cases with redundant stores of an
existing memset, with other stores in between.

It also makes a few of the existing tests more robust.

286e98b9

[NFC] Re-harden test/Transforms/LoopVectorize/X86/pr48340.ll · e1db7270

Roman Lebedev authored Oct 22, 2021

This test is quite fragile WRT improvements to the interleaved load cost
modelling. Let's bump the stride way up so that is no longer a concern.

e1db7270

Revert "[NFC][LV] Autogenerate check lines in a test for ease of future update" · 6f6842d7
Roman Lebedev authored Oct 22, 2021
```
This reverts commit 8ae83a1b.
```
6f6842d7
AMDGPULibCalls - constify some FuncInfo& arguments. NFCI. · a750332d
Simon Pilgrim authored Oct 22, 2021

a750332d

[TTI] `BasicTTIImplBase::getInterleavedMemoryOpCost()`: fix load discounting · 2eaef530

Roman Lebedev authored Oct 22, 2021

The math here is:
Cost of 1 load = cost of n loads / n
Cost of live loads = num live loads * Cost of 1 load
Cost of live loads = num live loads * (cost of n loads / n)
Cost of live loads = cost of n loads * (num live loads / n)

But, all the variables here are integers,
and integer division rounds down,
but this calculation clearly expects float semantics.

Instead multiply upfront, and then perform round-up-division.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112302

2eaef530

[NFC][LV] Autogenerate check lines in a test for ease of future update · 8ae83a1b
Roman Lebedev authored Oct 22, 2021

8ae83a1b

AMDGPULibCalls::parseFunctionName - use reference instead of pointer. NFCI. · 99a64cc9

Simon Pilgrim authored Oct 22, 2021

parseFunctionName allowed a default null pointer, despite it being dereferenced immediately to be used as a reference and that all callers were taking the address of an existing reference.

Fixes static analyzer warning about potential dereferenced nulls

99a64cc9

[llvm] [ADT] Update llvm::Split() per Pavel Labath's suggestions · 66e06cc8

Michał Górny authored Sep 27, 2021

Optimize the iterator comparison logic to compare Current.data()
pointers.  Use std::tie for assignments from std::pair.  Replace
the custom class with a function returning iterator_range.

Differential Revision: https://reviews.llvm.org/D110535

66e06cc8

[LLVM-C]Add LLVMAddMetadataToInst, deprecated LLVMSetInstDebugLocation. · d4653156

Florian Hahn authored Oct 22, 2021

IRBuilder has been updated to support preserving metdata in a more
general manner. This patch adds `LLVMAddMetadataToInst` and
deprecates `LLVMSetInstDebugLocation` in favor of the more
general function.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D93454

d4653156

[RISCV] Fix missing cross-block VSETVLI insertion · 74c6895b

Fraser Cormack authored Oct 21, 2021

This patch fixes a codegen bug, the test for which was introduced in
D112223.

When merging VSETVLIInfo across blocks, if the 'exit' VSETVLIInfo
produced by a block is found to be compatible with the VSETVLIInfo
computed as the intersection of the 'exit' VSETVLIInfo produced by the
block's predecessors, that blocks' 'exit' info is discarded and the
intersected value is taken in its place.

However, we have one authority on what constitutes VSETVLIInfo
compatibility and we are using it in two different contexts.

Compatibility is used in one context to elide VSETVLIs between
straight-line vector instructions. But compatibility when evaluated
between two blocks' exit infos ignores any info produced *inside* each
respective block before the exit points. As such it does not guarantee
that a block will not produce a VSETVLI which is incompatible with the
'previous' block.

As such, we must ensure that any merging of VSETVLIInfo is performed
using some notion of "strict" compatibility. I've defined this as a full
vtype match, but this is perhaps too pessimistic. Given that test
coverage in this regard is lacking -- the only change is in the failing
test -- I think this is a good starting point.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D112228

74c6895b

[PowerPC] iterate on the SmallSet directly; NFC · 86a5c326
Chen Zheng authored Oct 22, 2021

86a5c326
[PowerPC] return early if there is no preparing candidate in the loop; NFC · 13755436
Chen Zheng authored Oct 22, 2021
```
This is to improve compiling time.

Differential Revision: https://reviews.llvm.org/D112196

Reviewed By: jsji
```
13755436

[Coroutines] Ignore partial lifetime markers refer of an alloca · ddbf1961

Chuanqi Xu authored Oct 22, 2021

When I playing with Coroutines, I found that it is possible to generate
following IR:
```
%struct = alloca ...
%sub.element = getelementptr %struct, i64 0, i64 index ; index is not
%zero
lifetime.marker.start(%sub.element)
% use of %sub.element
lifetime.marker.end(%sub.element)
store %struct to xxx ;  %struct is escaping!

<suspend points>
```

Then the AllocaUseVisitor would collect the lifetime marker for
sub.element and treat it as the lifetime markers of the alloca! So it
judges that the alloca could be put on the stack instead of the frame by
judging the lifetime markers only.
The root cause for the bug is that AllocaUseVisitor collects wrong
lifetime markers.

This patch fixes this.

Reviewed By: lxfind

Differential Revision: https://reviews.llvm.org/D112216

ddbf1961

[gn build] Port 2e97236a · a48bfc2c
LLVM GN Syncbot authored Oct 22, 2021

a48bfc2c

[msan] Don't use TLS slots of noundef args · b7ea298d

Vitaly Buka authored Oct 20, 2021

Transformations may strip the attribute from the
argument, e.g. for unused, which will result in
shadow offsets mismatch between caller and
callee.

Stripping noundef for used arguments can be
a problem, as TLS is not going to be set
by caller. However this is not the goal of the
patch and I am not aware if that's even
possible.

Differential Revision: https://reviews.llvm.org/D112197

b7ea298d

[AMDGPU] Allow to use a whole register file on gfx90a for VGPRs · ca0c92d6

Stanislav Mekhanoshin authored Oct 13, 2021

In a kernel which does not have calls or AGPR usage we can allocate
the whole vector register budget for VGPRs and have no AGPRs as
long as VGPRs stay addressable (i.e. below 256).

Differential Revision: https://reviews.llvm.org/D111764

ca0c92d6

[gn build] Make 'compiler-rt' depend on include dir · 4976be1e

Nico Weber authored Oct 21, 2021

That way, the headers in llvm/utils/gn/secondary/compiler-rt/include
are copied when running `ninja compiler-rt`. (Previously, they were
only copied when running `check-hwasan` or when building the
compiler-rt/include target.)

(Since they should be copied only once, depend on the target in the
host toolchain. I think default_toolchain should work just as well,
it just needs to be a single fixed toolchain. check-hwasan depends
through host_toolchain, so let's use that here too.)

Prevents errors like

    testing/fuzzed_data_provider.h:8:10: fatal error: 'fuzzer/FuzzedDataProvider.h' file not found

when building with locally-built clang. (For now, you still have to
explicitly build the 'compiler-rt' target. Maybe we should make the
clang target depend on that in the GN build?)

Differential Revision: https://reviews.llvm.org/D112238

4976be1e

[Demangle] Rename OutputStream to OutputString · 2e97236a

Luís Ferreira authored Oct 21, 2021

This patch is a refactor to implement prepend afterwards. Since this changes a lot of files and to conform with guidelines, I will separate this from the implementation of prepend. Related to the discussion in https://reviews.llvm.org/D111414 , so please read it for more context.

Reviewed By: #libc_abi, dblaikie, ldionne

Differential Revision: https://reviews.llvm.org/D111947

2e97236a

[DebugInfo] Expand ability to load 2-byte addresses in dwarf sections · d7733f84

Jack Anderson authored Oct 21, 2021

Some dwarf loaders in LLVM are hard-coded to only accept 4-byte and 8-byte address sizes. This patch generalizes acceptance into `DWARFContext::isAddressSizeSupported` and provides a common way to generate rejection errors.

The MSP430 target has been given new tests to cover dwarf loading cases that previously failed due to 2-byte addresses.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D111953

d7733f84

[docs] Remove Makefile.sphinx files · c16655f1

Tom Stellard authored Oct 21, 2021

Does anyone still use these?  I want to make some changes to the sphinx
html generation and I don't want to have to implement the changes in
two places.

Reviewed By: sylvestre.ledru, #libc, ldionne

Differential Revision: https://reviews.llvm.org/D112030

c16655f1

[TargetLowering] Simplify the interface for expandCTPOP/expandCTLZ/expandCTTZ. · 996123e5

Craig Topper authored Oct 21, 2021

There is no need to return a bool and have an SDValue output
parameter. Just return the SDValue and let the caller check if it
is null.

I have another patch to add more callers of these so I thought
I'd clean up the interface first.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112267

996123e5

[LegalizeVectorOps][X86] Don't defer BITREVERSE expansion to LegalizeDAG. · ff37b110

Craig Topper authored Oct 21, 2021

By expanding early it allows the shifts to be custom lowered in
LegalizeVectorOps. Then a DAG combine is able to run on them before
LegalizeDAG handles the BUILD_VECTORS for the masks used.

v16Xi8 shift lowering on X86 requires a mask to be applied to a v8i16
shift. The BITREVERSE expansion applied an AND mask before SHL ops and
after SRL ops. This was done to share the same mask constant for both shifts.
It looks like this patch allows DAG combine to remove the AND mask added
after v16i8 SHL by X86 lowering. This maintains the mask sharing that
BITREVERSE was trying to achieve. Prior to this patch it looks like
we kept the mask after the SHL instead which required an extra constant
pool or a PANDN to invert it.

This is dependent on D112248 because RISCV will end up scalarizing the BSWAP
portion of the BITREVERSE expansion if we don't disable BSWAP scalarization in
LegalizeVectorOps first.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112254

ff37b110

[InstCombine] Precommit new and-xor-or.ll tests. NFC. · c0d6e1b9
Stanislav Mekhanoshin authored Oct 21, 2021

c0d6e1b9

Oct 21, 2021
- Remove unused parallel-libs project · db0486c4
  David Blaikie authored Oct 21, 2021
```
Differential Revision: https://reviews.llvm.org/D112265
```
  db0486c4