Commits · af1c5312d76000bf134d8b81cdb7343607c6ee64 · Lorenzo Albano / LLVM bpEVL

Sep 21, 2021

[InstCombine] add tests for mask-shift with trunc; NFC · af1c5312
Sanjay Patel authored Sep 20, 2021

af1c5312
[VectorCombine] Add tests which require DT to use info from assumes. · ea27dd74
Florian Hahn authored Sep 21, 2021

ea27dd74

[InstCombine] foldConstantInsEltIntoShuffle - bail if we fail to find constant element (PR51824) · fc8f1e44

Simon Pilgrim authored Sep 21, 2021

If getAggregateElement() returns null for any element, early out as otherwise we will assert when creating a new constant vector

Fixes PR51824 + ; OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38057

fc8f1e44

[LowerConstantIntrinsics] Fix heap-use-after-free bug in worklist · 7b4cc09b

David Stenberg authored Sep 21, 2021

This fixes PR51730, a heap-use-after-free bug in
replaceConditionalBranchesOnConstant().

With the attached reproducer we were left with a function looking
something like this after replaceAndRecursivelySimplify():

  [...]

  cont2.i:
    br i1 %.not1.i, label %handler.type_mismatch3.i, label %cont4.i

  handler.type_mismatch3.i:
    %3 = phi i1 [ %2, %cont2.thread.i ], [ false, %cont2.i ]
    unreachable

  cont4.i:
    unreachable

  [...]

with both the branch instruction and PHI node being in the worklist. As
a result of replacing the branch instruction with an unconditional
branch, the PHI node in %handler.type_mismatch3.i would be removed. This
then resulted in a heap-use-after-free bug due to accessing that removed
PHI node in the next worklist iteration.

This is solved by using a value handle worklist. I am a unsure if this
is the most idiomatic solution. Another solution could have been to
produce a worklist just containing the interesting branch instructions,
but I thought that it perhaps was a bit cleaner to keep all worklist
filtering in the loop that does the rewrites.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D109221

7b4cc09b

[SCEV] Generalize implication when signedness of FoundPred doesn't matter · 2c7d5fbc

Max Kazantsev authored Sep 21, 2021

The implication logic for two values that are both negative or non-negative
says that it doesn't matter whether their predicate is signed and unsigned,
but only flips unsigned into signed for further inference. This patch adds
support for flipping a signed predicate into unsigned as well.

Differential Revision: https://reviews.llvm.org/D109959
Reviewed By: nikic

2c7d5fbc

[SimplifyCFG] Redirect switch cases that lead to UB into an unreachable block · 073b254c

Max Kazantsev authored Sep 21, 2021

When following a case of a switch instruction is guaranteed to lead to
UB, we can safely break these edges and redirect those cases into a newly
created unreachable block. As result, CFG will become simpler and we can
remove some of Phi inputs to make further analyzes easier.

Patch by Dmitry Bakunevich!

Differential Revision: https://reviews.llvm.org/D109428
Reviewed By: lebedev.ri

073b254c

[InstCombine] Eliminate vector reverse if all inputs/outputs to an instruction are reverses · f417d9d8
Usman Nadeem authored Sep 20, 2021
```
Differential Revision: https://reviews.llvm.org/D109808

Change-Id: I1a10d2bc33acbe0ea353c6cb3d077851391fe73e
```
f417d9d8

Sep 20, 2021

[IR] Add helper to convert offset to GEP indices · dd022656

Nikita Popov authored Sep 19, 2021

We implement logic to convert a byte offset into a sequence of GEP
indices for that offset in a number of places. This patch adds a
DataLayout::getGEPIndicesForOffset() method, which implements the
core logic. I've updated SROA, ConstantFolding and InstCombine to
use it, and there's a few more places where it looks relevant.

Differential Revision: https://reviews.llvm.org/D110043

dd022656

[DSE] Add additional tests to cover review comments. · 963d3a22

Florian Hahn authored Sep 20, 2021

Adds additional tests following comments from D109844.

Also removes unusued in.ptr arguments and places in the call tests that
used loads instead of a getval call.

963d3a22

[SLP]Improve graph reordering. · bc69dd62

Alexey Bataev authored Aug 03, 2021

Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020

bc69dd62

[Analysis] Add support for vscale in computeKnownBitsFromOperator · f988f680

David Sherwood authored Sep 16, 2021

In ValueTracking.cpp we use a function called
computeKnownBitsFromOperator to determine the known bits of a value.
For the vscale intrinsic if the function contains the vscale_range
attribute we can use the maximum and minimum values of vscale to
determine some known zero and one bits. This should help to improve
code quality by allowing certain optimisations to take place.

Tests added here:

  Transforms/InstCombine/icmp-vscale.ll

Differential Revision: https://reviews.llvm.org/D109883

f988f680

[NFC] Add assert and test showing that revert of D109596 wasn't justified · e9d34c54

Max Kazantsev authored Sep 20, 2021

All transforms of IndVars have prerequisite requirement of LCSSA and LoopSimplify
form and rely on it. Added test that shows that this actually stands.

e9d34c54

Revert "Revert "[IndVars] Replace PHIs if loop exits on 1st iteration"" · 471217cf

Max Kazantsev authored Sep 20, 2021

This reverts commit 6fec6552.

The patch was reverted on incorrect claim that this patch may break LCSSA form
when the loop is not in a simplify form. All IndVars' transform insure that
the loop is in simplify and LCSSA form, so if it wasn't broken before this
transform, it will also not be broken after it.

471217cf

[SCEV] Support negative values in signed/unsigned predicate reasoning · def15c5f

Max Kazantsev authored Sep 20, 2021

There is a piece of logic that uses the fact that signed and unsigned
versions of the same predicate are equivalent when both values are
non-negative. It's also true when both of them are negative.

Differential Revision: https://reviews.llvm.org/D109957
Reviewed By: nikic

def15c5f

Sep 19, 2021

[DebugInfo][LSR] Emit shorter expressions from scev-based salvaging · 5ba80203

Chris Jackson authored Sep 19, 2021

The scev-based salvaging for LSR can sometimes produce unnecessarily
verbose expressions. This patch adds logic to detect when the value to
be recovered and the induction variable differ by only a constant
offset. Then, the expression to derive the current iteration count can
be omitted from the dbg.value in favour of the offset.

Reviewed by: aprantl

Differential Revision: https://reviews.llvm.org/D109044

5ba80203

[InstCombine] add/adjust tests for min/max intrinsics; NFC · 9555d1ed
Sanjay Patel authored Sep 19, 2021
```
If we transform these, we have to propagate no-wrap/undef carefully.
```
9555d1ed

Sep 18, 2021

[Tests] Fix noalias metadata in one more test · abe21da6

Nikita Popov authored Sep 18, 2021

Missed this one in 80110aaf. This
is another test mixing up alias scopes and alias scope lists.

abe21da6

[Tests] Fix incorrect noalias metadata · 80110aaf

Nikita Popov authored Sep 16, 2021

Mostly this fixes cases where !noalias or !alias.scope were passed
a scope rather than a scope list. In some cases I opted to drop
the metadata entirely instead, because it is not really relevant
to the test.

80110aaf

Precommit tests for D109807 "[InstCombine] Narrow type of logical operation chains when possible" · d841c72e
Usman Nadeem authored Sep 18, 2021
```
Change-Id: Iae9bf18619e4926301a866c7e2bd38ced524804e
```
d841c72e

[Attributor] Change AAExecutionDomain to check intrinsic edges · 27905eeb

Joseph Huber authored Sep 15, 2021

The AAExecutionDomain instance checks if a BB is executed by the main
thread only. Currently, this only checks the `__kmpc_kernel_init` call
for generic regions to indicate the path taken by the main thread. In
the new runtime, we want to be able to detect basic blocks even in SPMD
mode. For this we enable it to check thread-ID intrinsics being compared
to zero as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109849

27905eeb

[OpenMP] Add NoSync attributes to alloc / free shared RTL calls · fec2927e

Joseph Huber authored Sep 17, 2021

This patch adds the `nosync` attribute to the `__kmpc_alloc_shared` and
`__kmpc_free_shared` runtime library calls. This allows code analysis to
know that these functins dont contain any barriers. This will help
optimizations reason about the CFG of blocks containing these calls.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109995

fec2927e

[AArch64][SVE][InstCombine] Fold redundant zip1/2(uzp1/2) operations · 757384ab

Usman Nadeem authored Sep 12, 2021

    zip1(uzp1(A, B), uzp2(A, B)) --> A
    zip2(uzp1(A, B), uzp2(A, B)) --> B

Differential Revision: https://reviews.llvm.org/D109666

Change-Id: I4a6578db2fcef9ff71ad0e77b9fe08354e6dbfcd

757384ab

Sep 17, 2021

[InstCombine] add tests for min/max intrinsics with offset operand; NFC · 6da35036
Sanjay Patel authored Sep 17, 2021

6da35036
[NFC] Precommit tests for D109954 · d01e0c8c
Dávid Bolvanský authored Sep 17, 2021

d01e0c8c

[CSSPGO] Tweakes to lower pseudo probe runtime overhead · c5fafc1e

Hongtao Yu authored Sep 17, 2021

A couple tweaks to

1. allow more thinlto importing by excluding probe intrinsics from IR size in module summary

2. Allow general default attributes (nofree nosync nounwind) for pseudo probe intrinsic. Without those attributes, pseudo probes will be basically treated as unknown calls which will in turn block their containing functions from annotated with those attributes.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D109976

c5fafc1e

[SLP][NFC]Add a test for reorder of alt shuffle operands. · 2b0b1d53
Alexey Bataev authored Sep 17, 2021

2b0b1d53
[InstCombine] allow splat vectors for narrowing masked fold · 41ff7612
Sanjay Patel authored Sep 17, 2021
```
Mostly cosmetic diffs, but the use of m_APInt matches splat constants.
```
41ff7612
[InstCombine] add vector tests for 'and' folds; NFC · 3a587ed2
Sanjay Patel authored Sep 17, 2021

3a587ed2
[Test] Add simple test where IndVars fails to remove checks on negative values · 690f7695
Max Kazantsev authored Sep 17, 2021

690f7695

[DSE] Add test cases with stores to objects before they escape. · bdafe312

Florian Hahn authored Sep 15, 2021

Test cases where stores to local objects can be removed because the
object does not escape before calls that may read/write to memory.

Includes test from PR50220.

bdafe312

[Test] One more missing opportunity on IndVars check removal · 74fa174f
Max Kazantsev authored Sep 17, 2021

74fa174f

[FuncSpec] Specialising on addresses of const global values. · 97cc678c

Sjoerd Meijer authored Sep 17, 2021

This introduces an option to allow specialising on the address of global
values. This option is off by default because it is likely not that profitable
to do so and needs more investigation. Before, we were specialising on addresses
and thus this changes the default behaviour.

Differential Revision: https://reviews.llvm.org/D109775

97cc678c

[GlobalOpt] Do not shrink global to bool for an unfavorable AS · 167ff528

Christudasan Devadasan authored Sep 15, 2021

Do not call `TryToShrinkGlobalToBoolean` for address spaces
that don't allow initializers. It inserts an initializer value
while shrinking to bool. Used the target hook introduced with
D109337 to skip this call for the restricted address spaces.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D109823

167ff528

Update LoopPredication test to fix buildbot failure. · fe950cba
Daniil Suchkov authored Sep 16, 2021
```
This patch updates tests added in 5f2b7879.
```
fe950cba

[LoopPredication] Report changes correctly when attempting loop exit predication · 0e362883

Daniil Suchkov authored Sep 14, 2021

To make the IR easier to analyze, this pass makes some minor transformations.
After that, even if it doesn't decide to optimize anything, it can't report that
it changed nothing and preserved all the analyses.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D109855

0e362883

NFC. Add tests exposing missing analysis invalidation in LoopPredication. · 5f2b7879
Daniil Suchkov authored Sep 14, 2021

5f2b7879
[LoopIdiomRecognize][Remarks] Track loop-strided store to/from blocks · 4b19e7df
Jon Roelofs authored Sep 16, 2021
```
Differential revision: https://reviews.llvm.org/D109929
```
4b19e7df

Sep 16, 2021

[SimplifyCFG] Add bonus when seeing vector ops to branch fold to common dest · d49cb5b3

Arthur Eubanks authored Aug 30, 2021

This makes some tests in vector-reductions-logical.ll more stable when
applying D108837.

The cost of branching is higher when vector ops are involved due to
potential SLP transformations.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D108935

d49cb5b3

[InstCombine] Added llvm.powi optimizations · a4a426c9

Dávid Bolvanský authored Sep 16, 2021

If power is even:
powi(-x, p) -> powi(x, p)
powi(fabs(x), p) -> powi(x, p)
powi(copysign(x, y), p) -> powi(x, p)

a4a426c9

[NFC] Added tests for llvm.powi optimizations · c0afb009
Dávid Bolvanský authored Sep 16, 2021

c0afb009