Commits · d85e347a28dc9a329d7029987e4e062428985b41 · Lorenzo Albano / LLVM bpEVL

Sep 20, 2021

[RISCV] Add a pass to recognize VLS strided loads/store from gather/scatter. · d85e347a

Craig Topper authored Sep 20, 2021

For strided accesses the loop vectorizer seems to prefer creating a
vector induction variable with a start value of the form
<i32 0, i32 1, i32 2, ...>. This value will be incremented each
loop iteration by a splat constant equal to the length of the vector.
Within the loop, arithmetic using splat values will be done on this
vector induction variable to produce indices for a vector GEP.

This pass attempts to dig through the arithmetic back to the phi
to create a new scalar induction variable and a stride. We push
all of the arithmetic out of the loop by folding it into the start,
step, and stride values. Then we create a scalar GEP to use as the
base pointer for a strided load or store using the computed stride.
Loop strength reduce will run after this pass and can do some
cleanups to the scalar GEP and induction variable.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107790

d85e347a

[ELF] Don't fall back to .text for e_entry · d001ab82

Fangrui Song authored Sep 20, 2021

We have the rule to simulate
(https://sourceware.org/binutils/docs/ld/Entry-Point.html),
but the behavior is questionable
(https://sourceware.org/pipermail/binutils/2021-September/117929.html).

gold doesn't fall back to .text.
The behavior is unlikely relied by projects (there is even a warning for
executable links), so let's just delete this fallback path.

Reviewed By: jhenderson, peter.smith

Differential Revision: https://reviews.llvm.org/D110014

d001ab82

[Verifier] Verify scoped noalias metadata · 8700f2bd

Nikita Popov authored Sep 16, 2021

Verify that !noalias, !alias.scope and llvm.experimental.noalias.scope
arguments have the format specified in
https://llvm.org/docs/LangRef.html#noalias-and-alias-scope-metadata.
I've fixed up a lot of broken metadata used by tests in advance.
Especially using a scope instead of the expected scope list is a
commonly made mistake.

Differential Revision: https://reviews.llvm.org/D110026

8700f2bd

[lldb] Extract adding symbols for UUID/File/Frame (NFC) · a89bfc61

Jonas Devlieghere authored Sep 20, 2021

This moves the logic for adding symbols based on UUID, file and frame
into little helper functions. This is in preparation for D110011.

Differential revision: https://reviews.llvm.org/D110010

a89bfc61

[lldb] Fix whitespace in CommandObjectTarget (NFC) · fe4b8467
Jonas Devlieghere authored Sep 17, 2021

fe4b8467

[DSE] Add additional tests to cover review comments. · 963d3a22

Florian Hahn authored Sep 20, 2021

Adds additional tests following comments from D109844.

Also removes unusued in.ptr arguments and places in the call tests that
used loads instead of a getval call.

963d3a22

[mlir][linalg] Add IndexOp support to fusion on tensors. · 7be28d82

Tobias Gysi authored Sep 20, 2021

This revision depends on https://reviews.llvm.org/D109761 and https://reviews.llvm.org/D109766.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D109774

7be28d82

[MLIR][SCF] Add for-to-while loop transformation pass · 644b55d5

Morten Borup Petersen authored Aug 20, 2021

This pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the 'before' region of the while operation, and indctuion variable incrementation + the loop body in the 'after' region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop.
Any 'yield' ops in the for-loop are rewritten to additionally yield the (incremented) induction variable.

This transformation is useful for passes where we want to consider structured control flow solely on the basis of a loop body and the computation of a loop condition. As an example, when doing high-level synthesis in CIRCT, the incrementation of an IV in a for-loop is "just another part" of a circuit datapath, and what we really care about is the distinction between our datapath and our control logic (the condition variable).

Differential Revision: https://reviews.llvm.org/D108454

644b55d5

[mlir][linalg] Fix typo (NFC). · 09100c75
Tobias Gysi authored Sep 20, 2021

09100c75

[SLP]Improve graph reordering. · bc69dd62

Alexey Bataev authored Aug 03, 2021

Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020

bc69dd62

[flang] Put intrinsic function table back into order · 5661317f

peter klausler authored Sep 15, 2021

Some intrinsic functions weren't findable because the table
wasn't strictly in order of names.

And complete a missing generalization of the extension DCONJG
to accept any kind of complex argument, like DREAL and DIMAG
were.

Differential Revision: https://reviews.llvm.org/D110002

5661317f

[X86] Always check the size of SourceTy before getting the next type · 22767339

Wang, Pengfei authored Sep 20, 2021

D109607 results in a regression in llvm-test-suite.
The reason is we didn't check the size of SourceTy, so that we will
return wrong SSE type when SourceTy is overlapped.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D110037

22767339

[X86] Add test to show the effect caused by D109607. NFC · 5b47256f
Wang, Pengfei authored Sep 19, 2021

5b47256f

[OpenCL] Supports atomics in C++ for OpenCL 2021 · 228dd20c

Justas Janickas authored Sep 07, 2021

Atomics in C++ for OpenCL 2021 are now handled the same way as in
OpenCL C 3.0. This is a header-only change.

Differential Revision: https://reviews.llvm.org/D109424

228dd20c

[clangd] Bail-out when an empty compile flag is encountered · 444a5f30
Kadir Cetinkaya authored Sep 16, 2021
```
Fixes https://github.com/clangd/clangd/issues/865

Differential Revision: https://reviews.llvm.org/D109894
```
444a5f30

[mlir][linalg] Fusion on tensors. · 6db928b8

Tobias Gysi authored Sep 20, 2021

Add a new version of fusion on tensors that supports the following scenarios:
- support input and output operand fusion
- fuse a producer result passed in via tile loop iteration arguments (update the tile loop iteration arguments)
- supports only linalg operations on tensors
- supports only scf::for
- cannot add an output to the tile loop nest

The LinalgTileAndFuseOnTensors pass tiles the root operation and fuses its producers.

Reviewed By: nicolasvasilache, mravishankar

Differential Revision: https://reviews.llvm.org/D109766

6db928b8

[analyzer] Move docs of SmartPtr to correct subcategory · 5dee5011

Deep Majumder authored Sep 20, 2021

The docs of alpha.cplusplus.SmartPtr was incorrectly placed under
alpha.deadcode. Moved it to under alpha.cplusplus

Differential Revision: https://reviews.llvm.org/D110032

5dee5011

[Analysis] Add support for vscale in computeKnownBitsFromOperator · f988f680

David Sherwood authored Sep 16, 2021

In ValueTracking.cpp we use a function called
computeKnownBitsFromOperator to determine the known bits of a value.
For the vscale intrinsic if the function contains the vscale_range
attribute we can use the maximum and minimum values of vscale to
determine some known zero and one bits. This should help to improve
code quality by allowing certain optimisations to take place.

Tests added here:

  Transforms/InstCombine/icmp-vscale.ll

Differential Revision: https://reviews.llvm.org/D109883

f988f680

[AMDGPU] Regenerate checks · 680592b5
Jay Foad authored Sep 20, 2021

680592b5

[JITLink] Adopt forEachRelocation() helper in ELF RISCV backend (NFC) · e8d81d80

Stefan Gränitz authored Sep 20, 2021

Following D109516, this patch re-uses the new helper function for ELF relocation traversal in the RISCV backend.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D109522

e8d81d80

[JITLink] Adopt forEachRelocation() helper in ELF x86-64 backend (NFC) · 68914dc9

Stefan Gränitz authored Sep 20, 2021

Following D109516, this patch re-uses the new helper function for ELF relocation traversal in the x86-64 backend.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D109520

68914dc9

Thread safety analysis: Drop special block handling · 6de19ea4

Aaron Puchert authored Sep 20, 2021

Previous changes like D101202 and D104261 have eliminated the special
status that break and continue once had, since now we're making
decisions purely based on the structure of the CFG without regard for
the underlying source code constructs.

This means we don't gain anything from defering handling for these
blocks. Dropping it moves some diagnostics, though arguably into a
better place. We're working around a "quirk" in the CFG that perhaps
wasn't visible before: while loops have an empty "transition block"
where continue statements and the regular loop exit meet, before
continuing to the loop entry. To get a source location for that, we
slightly extend our handling for empty blocks. The source location for
the transition ends up to be the loop entry then, but formally this
isn't a back edge. We pretend it is anyway. (This is safe: we can always
treat edges as back edges, it just means we allow less and don't modify
the lock set. The other way around it wouldn't be safe.)

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D106715

6de19ea4

[lldb] [DynamicRegisterInfo] Unset value_regs/invalidate_regs before Finalize() · ec50d351

Michał Górny authored Sep 18, 2021

Set value_regs and invalidate_regs in RegisterInfo pushed onto m_regs
to nullptr, to ensure that the temporaries passed there are not
accidentally used.

Differential Revision: https://reviews.llvm.org/D109879

ec50d351

[lldb] [test] Add unittest for DynamicRegisterInfo::Finalize() · 4737dcbc
Michał Górny authored Sep 16, 2021
```
Differential Revision: https://reviews.llvm.org/D109906
```
4737dcbc

[Clang] [Fix] Clang build fails when build directory contains space character · fae57a6a

Brain Swift authored Sep 20, 2021

Clang build fails when build directory contains space character.

Error messages:

[ 95%] Linking CXX executable ../../../../bin/clang
clang: error: no such file or directory: 'Space/Net/llvm/Build/tools/clang/tools/driver/Info.plist'
make[2]: *** [bin/clang-14] Error 1
make[1]: *** [tools/clang/tools/driver/CMakeFiles/clang.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

The path name is actually:
  'Dev Space/Net/llvm/Build/tools/clang/tools/driver/Info.plist'

Bugzilla issue - https://bugs.llvm.org/show_bug.cgi?id=51884
Reporter and patch author - Brain Swift <bsp2bsp-llvm@yahoo.com>

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D109979

fae57a6a

[ARM] MVE reverse shuffles. · 3f90df22

David Green authored Sep 20, 2021

The vectorizer can sometimes make reverse shuffles from indices that
count down. In MVE, we don't have a 128bit rev instruction, but we can
select this to a VREV64 with some lane movs to swap the two halfs.

Ideally this would use VMOVD's, but only gets as far as VMOVS's at the
moment.

Differential Revision: https://reviews.llvm.org/D69510

3f90df22

[update_mir_test_checks.py] Use -NEXT FileCheck directories · 817e23d4

Alex Richardson authored Sep 20, 2021

Previously the script emitted output using plain CHECK directives. This
can result in a test passing even if there are some instructions between
CHECK directives that should have been removed. It also makes debugging
tests that have the output in a different order more difficult since
FileCheck can match with a later line and then complain about the "wrong"
directive not being found.

This will cause quite large diffs when updating existing tests, but I'm not sure we need an opt-in flag here.

Depends on D109765 (pre-commit tests)

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D109767

817e23d4

pre-commit test for D109767 · 7b68c072
Alex Richardson authored Sep 20, 2021
```
Differential Revision: https://reviews.llvm.org/D109765
```
7b68c072

Fix CLANG_ENABLE_STATIC_ANALYZER=OFF building all analyzer source · 6d7b3d6b

Alex Richardson authored Sep 20, 2021

Since https://reviews.llvm.org/D87118, the StaticAnalyzer directory is
added unconditionally. In theory this should not cause the static analyzer
sources to be built unless they are referenced by another target. However,
the clang-cpp target (defined in clang/tools/clang-shlib) uses the
CLANG_STATIC_LIBS global property to determine which libraries need to
be included. To solve this issue, this patch avoids adding libraries to
that property if EXCLUDE_FROM_ALL is set.

In case something like this comes up again: `cmake --graphviz=targets.dot`
is quite useful to see why a target is included as part of `ninja all`.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D109611

6d7b3d6b

MachOObjectFile - checkOverlappingElement - use const-ref to avoid unnecessary copies. NFCI. · 7fc12b82
Simon Pilgrim authored Sep 17, 2021
```
Reported by MSVC static analyzer.
```
7fc12b82
[X86] X86TargetTransformInfo - remove unnecessary if-else after early exit. NFCI. · 4ab7c0d3
Simon Pilgrim authored Sep 17, 2021
```
(style) Break the if-else chain as they all return.
```
4ab7c0d3
[MCA] InstructionTables::execute() - use const-ref iterator in for-range loop. NFCI. · ea17b15f
Simon Pilgrim authored Sep 17, 2021
```
Avoid unnecessary copies, reported by MSVC static analyzer.
```
ea17b15f

[mlir][openacc] Make use of the second counter extension in DataOp translation · d6929aaa

Valentin Clement authored Jul 21, 2021

Make use of runtime extension for the second reference counter used in
structured data region. This extension is implemented in D106510 and D106509.

Differential Revision: https://reviews.llvm.org/D106517

d6929aaa

[lldb] [gdb-remote] Always send PID when detaching w/ multiprocess · b1099120

Michał Górny authored Sep 19, 2021

Always send PID in the detach packet when multiprocess extensions are
enabled. This is required by qemu's GDB server, as plain 'D' packet
results in an error and the emulated system is not resumed.

Differential Revision: https://reviews.llvm.org/D110033

b1099120

[GlobalISel] Improve elimination of dead instructions in legalizer · e4c46ddd

Petar Avramovic authored Sep 20, 2021

Add eraseInstr(s) utility functions. Before deleting an instruction
collects its use instructions. After deletion deletes use instructions
that became trivially dead.
This patch clears all dead instructions in existing legalizer mir tests.

Differential Revision: https://reviews.llvm.org/D109154

e4c46ddd

[NewPM] Make InlinerPass (aka 'inline') a parameterized pass · c8cb7f61

Bjorn Pettersson authored Sep 16, 2021

In default pipelines the ModuleInlinerWrapperPass is adding the
InlinerPass to the pipeline twice, once due to MandatoryFirst (passing
true in the ctor) and then a second time with false as argument.

To make it possible to bisect and reduce opt test cases for this
part of the pipeline we need to be able to choose between the two
different variants of the InlinerPass when running opt. This patch is
changing 'inline' to a CGSCC_PASS_WITH_PARAMS in the PassRegistry,
making it possible run opt with both -passes=cgscc(inline) and
-passes=cgscc(inline<only-mandatory>).

Reviewed By: aeubanks, mtrofin

Differential Revision: https://reviews.llvm.org/D109877

c8cb7f61

[clang][NFC] Remove dead code · eb3af1e7

Andy Wingo authored Aug 04, 2021

Remove code that has no effect in SemaType.cpp:processTypeAttrs.

Differential Revision: https://reviews.llvm.org/D108360

eb3af1e7

Add myself as a code owner for SYCL support · 15feaaa3
Alexey Bader authored Sep 20, 2021

15feaaa3

[OpenCL] Supports optional writing to 3d images in C++ for OpenCL 2021 · ca3bebd8

Justas Janickas authored Sep 06, 2021

Adds support for a feature macro __opencl_c_3d_image_writes in
C++ for OpenCL 2021 enabling a respective optional core feature
from OpenCL 3.0.

This change aims to achieve compatibility between C++ for OpenCL
2021 and OpenCL 3.0.

Differential Revision: https://reviews.llvm.org/D109328

ca3bebd8

AArch64: use ldp/stp for 128-bit atomic load/store in v.84 onwards · 13aa102e

Tim Northover authored Sep 15, 2021

v8.4 says that normal loads/stores of 128-bytes are single-copy atomic if
they're properly aligned (which all LLVM atomics are) so we no longer need to
do a full RMW operation to guarantee we got a clean read.

13aa102e