Commits · af1c5312d76000bf134d8b81cdb7343607c6ee64 · Lorenzo Albano / LLVM bpEVL

Sep 21, 2021

[InstCombine] add tests for mask-shift with trunc; NFC · af1c5312
Sanjay Patel authored Sep 20, 2021

af1c5312
[AMDGPU][MC][GFX10] Enabled dlc for FLAT and GLOBAL atomics · b8e7f532
Dmitry Preobrazhensky authored Sep 21, 2021
```
Differential Revision: https://reviews.llvm.org/D109614
```
b8e7f532

[IR] Add the constructor of ShuffleVector for one-input-vector. · 043733d6

hyeongyu kim authored Sep 21, 2021

One of the two inputs of the Shufflevector is often a placeholder.
Previously, there were cases where the placeholder was undef, and there were cases where it was poison.
I added these constructors to create a placeholder consistently.

Changing to use the newly added constructor will be written in a separate patch.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D110146

043733d6

[llvm] Pass LLVM_CHECK_ENABLED_PROJECTS through in cross builds · e9ea03c6
Nico Weber authored Sep 21, 2021

e9ea03c6

[SystemZ] Emit EXRL target instructions before text section is ended. · a48b43f9

Jonas Paulsson authored Sep 09, 2021

SystemZ adds the EXRL target instructions in the end of each file. This must
be done before debug info emission since that may end the text section, and
therefore this is now done in emitConstantPools() (instead of in
emitEndOfAsmFile).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D109513

a48b43f9

[VectorCombine] Add tests which require DT to use info from assumes. · ea27dd74
Florian Hahn authored Sep 21, 2021

ea27dd74

[AArch64] Improve schedule modelling on the Cortex-A55 · 9e4d7267

Nicholas Guy authored Sep 08, 2021

Enables the FuseAddress feature in the Cortex-A55 scheduling model

Differential Revision: https://reviews.llvm.org/D109323

9e4d7267

[InstCombine] foldConstantInsEltIntoShuffle - bail if we fail to find constant element (PR51824) · fc8f1e44

Simon Pilgrim authored Sep 21, 2021

If getAggregateElement() returns null for any element, early out as otherwise we will assert when creating a new constant vector

Fixes PR51824 + ; OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38057

fc8f1e44

[CodeGen] SelectionDAGBuilder - Use const-ref iterator in for-range loops. NFCI. · 20b58855
Simon Pilgrim authored Sep 21, 2021
```
Avoid unnecessary copies, reported by MSVC static analyzer.
```
20b58855
RewriteStatepointsForGC - Use const-ref iterator in for-range loops. NFCI. · f5d23d36
Simon Pilgrim authored Sep 21, 2021
```
Avoid unnecessary copies, reported by MSVC static analyzer.
```
f5d23d36
[CodeGen] SDDbgValue::getSDNodes() - use const-ref to avoid unnecessary copies. NFCI. · 0f83456c
Simon Pilgrim authored Sep 21, 2021
```
Reported by MSVC static analyzer.
```
0f83456c

tsan: simplify thread context setting · 9d7b7350

Dmitry Vyukov authored Sep 21, 2021

Currently we set thr->tctx after OnStarted callback
taking thread registry mutex again and searching for the context.
But OnStarted already runs under the thread registry mutex
and has access to the context, so set it in the OnStarted.
This makes code simpler and faster.

Depends on D110132.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D110133

9d7b7350

tsan: rearrange thread state callbacks (NFC) · 908256b0

Dmitry Vyukov authored Sep 21, 2021

Thread state functions are split into 2 parts:
tsan entry function (e.g. ThreadStart) and thread registry
state change callback (e.g. OnStart). Currently these
pairs of functions are located far from each other and
in reverse order. This makes it hard to read and follow the logic.
Reorder the code so that OnFoo directly follows ThreadFoo.
No other code changes.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D110132

908256b0

tsan: fix debug format strings · 6fe35ef4

Dmitry Vyukov authored Sep 21, 2021

Some of the DPrintf's currently produce -Wformat warnings if enabled.
Fix these format strings.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D110131

6fe35ef4

[AMDGPU] Prefer fmac over fma when selecting FMA_W_CHAIN · 598bebea

Jay Foad authored Sep 20, 2021

FMA_W_CHAIN is used when lowering fdiv f32. Prefer to select it to fmac
if there are no source modifiers, just like we do for other mad/mac and
fma/fmac cases.

Differential Revision: https://reviews.llvm.org/D110074

598bebea

[AMDGPU] Prefer v_fmac over v_fma only when no source modifiers are used · 86dcb592

Jay Foad authored Sep 20, 2021

v_fmac with source modifiers forces VOP3 encoding, but it is strictly
better to use the VOP3-only v_fma instead, because $dst and $src2 are
not tied so it gives the register allocator more freedom and avoids a
copy in some cases.

This is the same strategy we already use for v_mad vs v_mac and
v_fma_legacy vs v_fmac_legacy.

Differential Revision: https://reviews.llvm.org/D110070

86dcb592

[AArch64] Regenerate test lines in sve-implicit-zero-filling.ll · e8362928
David Green authored Sep 21, 2021

e8362928
[SCEV] Use isAvailableAtLoopEntry in the asserts · cd166fb2
Max Kazantsev authored Sep 21, 2021
```
This is what is supposed to be there.
```
cd166fb2

GlobalISel/Utils: Refactor constant splat match functions · 8bc71856

Petar Avramovic authored Sep 21, 2021

Add generic helper function that matches constant splat. It has option to
match constant splat with undef (some elements can be undef but not all).
Add util function and matcher for G_FCONSTANT splat.

Differential Revision: https://reviews.llvm.org/D104410

8bc71856

[SCEV] Add some asserts on availability of arguments of isLoopEntryGuardedByCond · 4d5d7254

Max Kazantsev authored Sep 21, 2021

The logic in howManyLessThans is fishy. It first checks invariance of
RHS, and then uses OrigRHS as argument for isLoopEntryGuardedByCond, which
is, strictly saying, a different thing. We are seeing a very rare intermittent
failure of availability checks, and it looks like this precondition is
sometimes broken. Before we can figure out what's going on, adding asserts
that all involved values that may possibly to to isLoopEntryGuardedByCond
are available at loop entry.

If either of these asserts fails (OrigRHS is the most likely suspect), it
means that the logic here is flawed.

4d5d7254

[LowerConstantIntrinsics] Fix heap-use-after-free bug in worklist · 7b4cc09b

David Stenberg authored Sep 21, 2021

This fixes PR51730, a heap-use-after-free bug in
replaceConditionalBranchesOnConstant().

With the attached reproducer we were left with a function looking
something like this after replaceAndRecursivelySimplify():

  [...]

  cont2.i:
    br i1 %.not1.i, label %handler.type_mismatch3.i, label %cont4.i

  handler.type_mismatch3.i:
    %3 = phi i1 [ %2, %cont2.thread.i ], [ false, %cont2.i ]
    unreachable

  cont4.i:
    unreachable

  [...]

with both the branch instruction and PHI node being in the worklist. As
a result of replacing the branch instruction with an unconditional
branch, the PHI node in %handler.type_mismatch3.i would be removed. This
then resulted in a heap-use-after-free bug due to accessing that removed
PHI node in the next worklist iteration.

This is solved by using a value handle worklist. I am a unsure if this
is the most idiomatic solution. Another solution could have been to
produce a worklist just containing the interesting branch instructions,
but I thought that it perhaps was a bit cleaner to keep all worklist
filtering in the loop that does the rewrites.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D109221

7b4cc09b

[OpenCL] Test case for C++ for OpenCL 2021 in OpenCL C header test · 57b8b5c1

Justas Janickas authored Sep 09, 2021

RUN line representing C++ for OpenCL 2021 added to the test. This
should have been done as part of earlier commit fb321c2e but
was missed during rebasing.

Differential Revision: https://reviews.llvm.org/D109492

57b8b5c1

[MLIR] NFC. gpu.launch op argument const folder cleanup · 5c77ed03

Uday Bondhugula authored Sep 21, 2021

NFC updates to gpu.launch op argument const folder.

Differential Revision: https://reviews.llvm.org/D110136

5c77ed03

[flang][docs] Document plugin limitations · 7e7484a8

Andrzej Warzynski authored Sep 16, 2021

This was extracted from the discussion on
https://reviews.llvm.org/D108283

.

Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>

Differential Revision: https://reviews.llvm.org/D109871

7e7484a8

Add CMAKE_BUILD_TYPE to the list of BOOTSTRAP_DEFAULT_PASSTHROUGH variables · eccd477c

Sylvestre Ledru authored Sep 21, 2021

When building clang in stage2, when -DCMAKE_BUILD_TYPE=RelWithDebInfo is set,
the developer can expect that the stage2 clang is built using the same mode.
Especially as the performances are much worst in debug mode.
(Principle of least astonishment)

Differential Revision: https://reviews.llvm.org/D53014

eccd477c

[PowerPC] NFC: Remove unused tblgen template args · b23d22f7

Cullen Rhodes authored Sep 21, 2021

Identified in D109359.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D109715

b23d22f7

[MLIR][SCF] Add for-to-while loop transformation pass · 032cb165

Morten Borup Petersen authored Sep 20, 2021

This pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the 'before' region of the while operation, and indctuion variable incrementation + the loop body in the 'after' region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop.
Any 'yield' ops in the for-loop are rewritten to additionally yield the (incremented) induction variable.

This transformation is useful for passes where we want to consider structured control flow solely on the basis of a loop body and the computation of a loop condition. As an example, when doing high-level synthesis in CIRCT, the incrementation of an IV in a for-loop is "just another part" of a circuit datapath, and what we really care about is the distinction between our datapath and our control logic (the condition variable).

Differential Revision: https://reviews.llvm.org/D108454

032cb165

[lldb] Speculative fix to TestGuiExpandThreadsTree · 791b6ebc

Pavel Labath authored Sep 21, 2021

This test relies on being able to unwind from an arbitrary place inside
libc. While I am not sure this is the cause of the observed flakyness,
it is known that we are not able to unwind correctly from some places in
(linux) libc.

This patch adds additional synchronization to ensure that the inferior
is in the main function (instead of pthread guts) when lldb tries to
unwind it. At the very least, it should make the test runs more
predictable/repeatable.

791b6ebc

[MLIR] Add mergeLocalIds and mergeSymbolIds · 0d12c991

Kunwar Shaanjeet Singh Grover authored Sep 21, 2021

This patch adds mergeLocalIds andmergeSymbolIds as public functions
for FlatAffineConstraints and FlatAffineValueConstraints respectively.

mergeLocalIds is also required to support divisions in intersection,
subtraction, equality checks, and complement for PresburgerSet.

This patch is part of a series of patches aimed at generalizing affine
dependence analysis.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D110045

0d12c991

[clangd] Deduplicate inlay hints · d87d1aa0

Nathan Ridge authored Sep 20, 2021

Duplicates can sometimes appear due to e.g. explicit template
instantiations

Differential Revision: https://reviews.llvm.org/D110051

d87d1aa0

[GlobalISel][Legalizer] Use ArtifactValueFinder first for unmerge combines before trying others. · cc65e08f

Amara Emerson authored Sep 14, 2021

This is motivated by an pathological compile time issue during unmerge combining.

We should be able to use the AVF to do simplification. However AMDGPU
has a lot of codegen changes which I'm not sure how to evaluate.

Differential Revision: https://reviews.llvm.org/D109748

cc65e08f

[DSE][NFC] Rename Later->Killing, Earlier->Dead · 129cf336

Evgeniy Brevnov authored Jul 28, 2021

First (and biggest) change is to use "Killing/Dead" in place of "Later/Earlier" base for names in DSE. For example, [Maybe]DeadLoc - is a location killed by KillingI instruction. I believe such names are more descriptive and easy to understand than current ones.

Second, there are inconsistencies in naming where different names are used for the same thing. Fixed that too.

Third, reordered parameters of isPartialOverwrite, tryToMergePartialOverlappingStores, isOverwrite to make them consistent between each other. This greatly reduces potential mistakes.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D106947

129cf336

[GlobalISel][Legalizer] Don't use eraseFromParentAndMarkDBGValuesForRemoval() for some artifacts. · 7091a7f7

Amara Emerson authored Sep 14, 2021

For artifacts excluding G_TRUNC/G_SEXT, which have IR counterparts, we don't
seem to have debug users of defs. However, in the legalizer we're always calling
MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() which is expensive.
In some rare cases, this contributes significantly to unreasonably long compile
times when we have lots of artifact combiner activity.

To verify this, I added asserts to that function when it actually replaced a debug
use operand with undef for these artifacts. On CTMark with both -O0 and -Os and
debug info enabled, I didn't see a single case where it triggered.

In my measurements I saw around a 0.5% geomean compile-time improvement on -g -O0
for AArch64 with this change.

Differential Revision: https://reviews.llvm.org/D109750

7091a7f7

[SCEV] Generalize implication when signedness of FoundPred doesn't matter · 2c7d5fbc

Max Kazantsev authored Sep 21, 2021

The implication logic for two values that are both negative or non-negative
says that it doesn't matter whether their predicate is signed and unsigned,
but only flips unsigned into signed for further inference. This patch adds
support for flipping a signed predicate into unsigned as well.

Differential Revision: https://reviews.llvm.org/D109959
Reviewed By: nikic

2c7d5fbc

BPF: make 32bit register spill with 64bit alignment · ea72b031

Yonghong Song authored Jul 31, 2021

In llvm, for non-alu32 mode, the stack alignment is 64bit so only one
64bit spill per 64bit slot. For alu32 mode, the stack alignment
is 32bit, so it is possible to have two 32bit spills per
64bit slot.

Currently, bpf kernel verifier does not preserve register states
for 32bit spills. That is, one 32bit register may hold a constant
value or a bounded range before spill. After reload from the
stack, the information is lost and sometimes this may cause
verifier failure. For 64bit register spill, the verifier
indeed tries to preserve the register state for reloading.

The current verifier can be modestly changed to handle one
32bit spill per 64bit stack slot with state-preserving reload.
Handling two 32bit spills per 64bit stack slot will require
substantial changes.

This patch changes stack alignment for alu32 to be 64bit.
This way, for any 64bit slot in alu32 mode, only one
32bit or 64bit register values can be saved. Together
with previous-mentioned verifier enhancement, 32bit
spill can be handled with state preserving.

Note that llvm stack slot coallescing
seems only doing adjacent packing which may leave some holes
in the stack. For example,
   stack slot 8   <== 8 bytes
   stack slot 4   <== 8 bytes with 4 byte hole
   stack slot 8   <== 8 bytes
   stack slot 4   <== 4 bytes

Differential Revision: https://reviews.llvm.org/D109073

ea72b031

[OpAsmParser] Add a parseCommaSeparatedList helper and beef up Delimeter. · 58abc8c3

Chris Lattner authored Sep 20, 2021

Lots of custom ops have hand-rolled comma-delimited parsing loops, as does
the MLIR parser itself.  Provides a standard interface for doing this that
is less error prone and less boilerplate.

While here, extend Delimiter to support <> and {} delimited sequences as
well (I have a use for <> in CIRCT specifically).

Differential Revision: https://reviews.llvm.org/D110122

58abc8c3

[SimplifyCFG] Redirect switch cases that lead to UB into an unreachable block · 073b254c

Max Kazantsev authored Sep 21, 2021

When following a case of a switch instruction is guaranteed to lead to
UB, we can safely break these edges and redirect those cases into a newly
created unreachable block. As result, CFG will become simpler and we can
remove some of Phi inputs to make further analyzes easier.

Patch by Dmitry Bakunevich!

Differential Revision: https://reviews.llvm.org/D109428
Reviewed By: lebedev.ri

073b254c

[Polly] Don't generate inter-iteration noalias metadata. · cad9f98a

Michael Kruse authored Sep 20, 2021

This metadata was intended to mark all accesses within an iteration to be pairwise non-aliasing, in this case because every memory of a base pointer is touched (read or write) at most once. This is typical for 'sweeps' over all data. The stated motivation from D30606 is to ensure that unrolled iterations are considered non-aliasing.

Rhe implemention had multiple issues:

* The structure of the noalias metadata was malformed. D110026 added check in the verifier for this metadata, and the tests were failing since then.

* This is not true for the outer loops of the BLIS matrix multiplication, where it was being inserted. Each element of A, B, C is accessed multiple times, as often as the loop not used as an index is iterating.

* Scopes were added to SecondLevelOtherAliasScopeList (used for the !noalias scop list) on-the-fly when another SCEV was seen. This meant that previously visited instructions would not be updated with alias scopes that are only seen later, missing out those SCEVs they should not be aliasing with.

* Since the !noalias scope list would ideally consists of all other SCEV for this base pointer, we might run quickly into scalability issues. Especially after unrolling there would probably at least once SCEV per instruction and unroll instance.

* The inter-iteration noalias base pointer was not removed after leaving the loop marked with it, effectively marking everything after it to noalias as well.

A solution I considered was to mark each instruction as non-aliasing with its own scope. The instruction itself would obviously alias itself, but such construction might also be considered invalid. Duplicating the instruction (e.g. due to speculation) would mark the instruction non-aliasing with its clone. I don't want to go into this territory, especially since the original motivation of determining unrolled instances as noalias based on SCEV is the what scev-aa does as well.

This effectively reverts D30606 and D35761.

cad9f98a

[NFC] Rename Context->CtxI in SCEV for uniformity reasons · a06db78f
Max Kazantsev authored Sep 21, 2021

a06db78f
[llvm] Use make_early_inc_range (NFC) · 85b4b21c
Kazu Hirata authored Sep 20, 2021

85b4b21c