Commits · 88c965ba14cff1e04ccb4237966a9fefe902b1f4 · Lorenzo Albano / LLVM bpEVL

Jun 17, 2020

BreakCriticalEdges for callbr indirect dests · 88c965ba

Nick Desaulniers authored Jun 17, 2020

Summary:
llvm::SplitEdge was failing an assertion that the BasicBlock only had
one successor (for BasicBlocks terminated by CallBrInst, we typically
have multiple successors).  It was surprising that the earlier call to
SplitCriticalEdge did not handle the critical edge (there was an early
return).  Removing that triggered another assertion relating to creating
a BlockAddress for a BasicBlock that did not (yet) have a parent, which
is a simple order of operations issue in llvm::SplitCriticalEdge (a
freshly constructed BasicBlock must be inserted into a Function's basic
block list to have a parent).

Thanks to @nathanchance for the report.
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1018

Reviewers: craig.topper, jyknight, void, fhahn, efriedma

Reviewed By: efriedma

Subscribers: eli.friedman, rnk, efriedma, fhahn, hiraditya, llvm-commits, nathanchance, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81607

88c965ba

[CGP] Reset the debug location when promoting zext(s). · 1cbaf847

Davide Italiano authored Jun 17, 2020

When the zext gets promoted, it used to retain the original location,
which pessimizes the debugging experience causing an unexpected
jump in stepping at -Og.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46120 (which also
contains a full C repro).

Differential Revision:  https://reviews.llvm.org/D81437

1cbaf847

Revert "[InlineCost] InlineCostAnnotationWriterPass introduced" · ea844c75
Kirill Naumov authored Jun 17, 2020
```
This reverts commit 37e06e8f.
```
ea844c75
Revert "[InlineCost] PrinterPass prints constants to which instructions are simplified" · dcf2a9f2
Kirill Naumov authored Jun 17, 2020
```
This reverts commit 52b0db22.
```
dcf2a9f2
Revert "[InlineCost] GetElementPtr with constant operands" · 39a4505e
Kirill Naumov authored Jun 17, 2020
```
This reverts commit 34fba68d.
```
39a4505e

[InlineCost] GetElementPtr with constant operands · 34fba68d

Kirill Naumov authored Jun 02, 2020

If the GEP instruction contanins only constants as its arguments,
then it should be recognized as a constant. For now, there was
also added a flag to turn off this simplification if it causes
any regressions ("disable-gep-const-evaluation") which is off
by default. Once I gather needed data of the effectiveness of
this simplification, the flag will be deleted.

Reviewers: apilipenko, davidxl, mtrofin

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D81026

34fba68d

[InlineCost] PrinterPass prints constants to which instructions are simplified · 52b0db22

Kirill Naumov authored Jun 02, 2020

This patch enables printing of constants to see which instructions were
constant-folded. Needed for tests and better visiual analysis of
inliner's work.

Reviewers: apilipenko, mtrofin, davidxl, fedor.sergeev

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D81024

52b0db22

[InlineCost] InlineCostAnnotationWriterPass introduced · 37e06e8f

Kirill Naumov authored Jun 11, 2020

This class allows to see the inliner's decisions for better
optimization verifications and tests. To use, use flag
"-passes="print<inline-cost>"".

Reviewers: apilipenko, mtrofin, davidxl, fedor.sergeev

Reviewed By: mtrofin

Differential revision: https://reviews.llvm.org/D81743

37e06e8f

[SCCP] Add a few more additional sext tests (NFC). · 6aae8ef1
Florian Hahn authored Jun 17, 2020

6aae8ef1
Recommit "[LV] Emit @llvm.get.active.lane.mask for tail-folded loops" · e345d547
Sjoerd Meijer authored Jun 17, 2020
```
Fixed ARM regression test.

Please see the original commit message rG47650451738c for details.
```
e345d547
[SCCP] Precommit some sext tests (NFC). · b1130c4f
Florian Hahn authored Jun 12, 2020

b1130c4f

Return "[InstCombine] Simplify compare of Phi with constant inputs against a constant" · 5bf0858c

Sam Parker authored Jun 17, 2020

I originally reverted the patch because it was causing performance
issues, but now I think it's just enabling simplify-cfg to do
something that I don't want instead :)

Sorry for the noise.

This reverts commit 3e39760f.

5bf0858c

[IR] Don't copy profile metadata in createCallMatchingInvoke() · 16ad6eeb

Hans Wennborg authored Jun 17, 2020

The invoke instruction can have profile metadata with branch_weights,
which does not make sense for a call instruction and will be
rejected by the verifier.

Differential revision: https://reviews.llvm.org/D81996

16ad6eeb

Revert "[LV] Emit @llvm.get.active.mask for tail-folded loops" · d4e183f6
Sjoerd Meijer authored Jun 17, 2020
```
This reverts commit 47650451
while I investigate the build bot failures.
```
d4e183f6

[LV] Emit @llvm.get.active.mask for tail-folded loops · 47650451

Sjoerd Meijer authored Jun 10, 2020

This emits new IR intrinsic @llvm.get.active.mask for tail-folded vectorised
loops if the intrinsic is supported by the backend, which is checked by
querying TargetTransform hook emitGetActiveLaneMask.

This intrinsic creates a mask representing active and inactive vector lanes,
which is used by the masked load/store instructions that are created for
tail-folded loops. The semantics of @llvm.get.active.mask are described here in
LangRef:

https://llvm.org/docs/LangRef.html#llvm-get-active-lane-mask-intrinsics

This intrinsic is also used to provide a hint to the backend. That is, the
second argument of the intrinsic represents the back-edge taken count of the
loop. For MVE, for example, we use that to set up tail-predication, which is a
new form of predication in MVE for vector loops that implicitely predicates the
last vector loop iteration by implicitely setting active/inactive lanes, i.e.
the tail loop is predicated. In order to set up a tail-predicated vector loop,
we need to know the number of data elements processed by the vector loop, which
corresponds the the tripcount of the scalar loop, which we can now reconstruct
using @llvm.get.active.mask.

Differential Revision: https://reviews.llvm.org/D79100

47650451

[MemDep] Also remove load instructions from NonLocalDesCache. · e4b58ea8

Florian Hahn authored Jun 17, 2020

Currently load instructions are added to the cache for invariant pointer
group dependencies, but only pointer values are removed currently. That
leads to dangling AssertingVHs in the test case below, where we delete a
load from an invariant pointer group. We should also remove the entries
from the cache.

Fixes PR46054.

Reviewers: efriedma, hfinkel, asbirlea

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D81726

e4b58ea8

[Test] Add missing opportunity for replacement of select with Phi · 9465dd5d
Max Kazantsev authored Jun 17, 2020

9465dd5d

Jun 16, 2020

[CGP] Add `--match-full-lines` to make sure we don't have a dbg attachment. · 6a5641ef
Davide Italiano authored Jun 16, 2020

6a5641ef

[Matrix] Add align info to some more loads/stores (NFC). · 08f62ff8

Florian Hahn authored Jun 16, 2020

Some tests were missing alignment info. Subsequent changes properly
preserve the set alignment. Set it properly beforehand, to avoid
unnecessary test changes.

08f62ff8

[TLI] Add four C++17 delete variants. · 6bc2b042

Hiroshi Yamauchi authored Jun 10, 2020

Summary:
delete(void*, unsigned int, align_val_t)
delete(void*, unsigned long, align_val_t)
delete[](void*, unsigned int, align_val_t)
delete[](void*, unsigned long, align_val_t)

Differential Revision: https://reviews.llvm.org/D81853

6bc2b042

[VectorCombine] scalarize compares with insertelement operand(s) · ed67f5e7

Sanjay Patel authored Jun 16, 2020

Generalize scalarization (recently enhanced with D80885)
to allow compares as well as binops.
Similar to binops, we are avoiding scalarization of a loaded
value because that could avoid a register transfer in codegen.
This requires 1 extra predicate that I am aware of: we do not
want to scalarize the condition value of a vector select. That
might also invert a transform that we do in instcombine that
prefers a vector condition operand for a vector select.

I think this is the final step in solving PR37463:
https://bugs.llvm.org/show_bug.cgi?id=37463

Differential Revision: https://reviews.llvm.org/D81661

ed67f5e7

[Matrix] Specify missing alignment in tests (NFC). · e02c9649

Florian Hahn authored Jun 16, 2020

Some tests were missing alignment info. Subsequent changes properly
preserve the set alignment. Set it properly beforehand, to avoid
unnecessary test changes.

It also updates cases where an alignment of 16 was specified, instead of
the vector element type alignment.

e02c9649

Revert "[AssumeBundles] add cannonicalisation to the assume builder" · d7deef12
Tyker authored Jun 16, 2020
```
This reverts commit 90c50cad.
```
d7deef12

[AssumeBundles] add cannonicalisation to the assume builder · 90c50cad

Tyker authored Jun 15, 2020

Summary:
this reduces significantly the number of assumes generated without aftecting too much
the information that is preserved. this improves the compile-time cost
of enable-knowledge-retention significantly.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79650

90c50cad

Revert "[IR] Clean up dead instructions after simplifying a conditional branch" · 6fdd5a28
Jay Foad authored Jun 16, 2020
```
This reverts commit 69bdfb07.

Reverting to investigate https://bugs.llvm.org/show_bug.cgi?id=46343
```
6fdd5a28

[OpenMPOpt] initial tests for ICV tracking. Only nthreads is used. · 73bfb4fd

sstefan1 authored Jun 16, 2020

Summary: Couple of tests to showcase what will be done and what to expect with ICV tracking.

Reviewers: jdoerfert, JonChesterfield

Subscribers: yaxunl, guansong, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81114

73bfb4fd

Jun 15, 2020

[CodeGenPrepare] Reset the debug location when promoting trunc(s) · c2dccf9d

Davide Italiano authored Jun 15, 2020

The promotion machinery in CGP moves instructions retaining
debug locations. When the transformation is local, this is mostly
correct, but when instructions are moved cross-BBs, this is not
always true and causes jumpiness in line tables. This is the first
of a series of commits. sext(s) and zext(s) need to be treated
similarly.

Differential Revision:  https://reviews.llvm.org/D81879

c2dccf9d

[IR] Add nocapture & nosync to matrix intrinsics. · 1d33c09f

Florian Hahn authored Jun 15, 2020

As suggested in D81472, the load/store intrinsics' pointer arguments can
be marked as nocapture and all matrix intrinsics as nosync.

This also re-flows the intrinsic definitions, to make them a little more
concise.

1d33c09f

[DSE,MSSA] Port partial store merging. · 120c0592

Florian Hahn authored Jun 15, 2020

Port partial constant store merging logic to MemorySSA backed DSE. The
heavy lifting is done by the existing helper function. It is used in
context where we already ensured that the later instruction can
eliminate the earlier one, if it is a complete overwrite.

120c0592

[DSE,MSSA] Delete instructions after printing it. · 8c61f13a
Florian Hahn authored Jun 15, 2020
```
Also enables a now-passing test case, that exposed a crash caused by the
wrong order.
```
8c61f13a
[DSE,MSSA] Add additional merging test cases (NFC). · 979720a9
Florian Hahn authored Jun 15, 2020
```
Additional tests added ahead of partial overlapping store merging.
```
979720a9

[Test] Add an example of unprofitable PR Phi insertion · 9e4f6748

Max Kazantsev authored Jun 15, 2020

This test demonstrates weird behavior of SimplifyCFG: seems that bigger
size of block leads to worse optimization choice.

9e4f6748

[CostModel] getCFInstrCost in getUserCost. · 2596da31

Sam Parker authored Jun 15, 2020

Have BasicTTI call the base implementation so that both agree on the
default behaviour, which the default being a cost of '1'. This has
required an X86 specific implementation as it seems to be very
reliant on those instructions being free. Changes are also made to
AMDGPU so that their implementations distinguish between cost kinds,
so that the unrolling isn't affected. PowerPC also has its own
implementation to prevent changes to the reg-usage vectorizer test.

The cost model test changes now reflect that ret instructions are not
generally free.

Differential Revision: https://reviews.llvm.org/D79164

2596da31

Revert "Return "[InstCombine] Simplify compare of Phi with constant inputs against a constant"" · 3e39760f
Sam Parker authored Jun 15, 2020
```
This reverts commit 23291b98.

This caused performance regressions.
```
3e39760f
[Test] Update test with check script, add two more motivating cases · 344eaf78
Max Kazantsev authored Jun 15, 2020

344eaf78

[NewPM] Avoid redundant CGSCC run for updated SCC · b559535a

Wenlei He authored May 26, 2020

Summary:
When an SCC got split due to inlining, we have two mechanisms for reprocessing the updated SCC, first is UR.UpdatedC
that repeatedly rerun the new, current SCC; second is a worklist for all newly split SCCs. We can avoid rerun of
the same SCC when the SCC is set to be processed by both mechanisms *back to back*. In pathological cases, such redundant
rerun could cause exponential size growth due to inlining along cycles, even when there's no SCC mutation and hence
convergence is not a problem.

Note that it's ok to have SCC updated and rerun immediately, and also in the work list if we have actually moved an SCC
to be topologically "below" the current one due to merging. In that case, we will need to revisit the current SCC after
those moved SCCs. For that reason, the redundant avoidance here only targets back to back rerun of the same SCC - the
case described by the now removed FIXME comment.

Reviewers: chandlerc, wmi

Subscribers: llvm-commits, hoy

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80589

b559535a

Jun 14, 2020

[LAA] Do not set CanDoRT to false for AS that do not need RT checks. · 6176f044

Florian Hahn authored Jun 13, 2020

Alternative approach to D80570.

canCheckPtrAtRT already contains checks the figure out for which alias
sets runtime checks are needed. But it currently sets CanDoRT to false
for alias sets for which we cannot do RT checks but also do not need
any.

If we know that we do not need RT checks based on the number of
reads/writes in the alias set, we can skip processing the AS.

This patch also adds an assertion to ensure that DepCands does not
contain more than one write from the alias set.

Reviewers: Ayal, anemet, hfinkel, dmgreen

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D80622

6176f044

[LoopUnroll] Allow loops with multiple exiting blocks where loop latch · 5225cd43

Whitney Tsang authored Jun 14, 2020

is not necessary one of them.

Summary: Currently LoopUnrollPass already allow loops with multiple
exiting blocks, but it is only allowed when the loop latch is one of the
exiting blocks.
When the loop latch is not an exiting block, then only single exiting
block is supported.
When possible, the single loop latch or the single exiting block
terminator is optimized to an unconditional branch in the unrolled loop.

This patch allows loops with multiple exiting blocks even if the loop
latch is not one of them. However, the optimization of exiting block
terminator to unconditional branch is not done when there exists more
than one exiting block.
Reviewer: dmgreen, Meinersbur, etiotto, fhahn, efriedma, bmahjour
Reviewed By: efriedma
Subscribers: hiraditya, zzheng, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D81053

5225cd43

[InstCombine] reassociate FP diff of sums into sum of diffs · b5fb2695

Sanjay Patel authored Jun 14, 2020

(a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) -->
(a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3])

This should be the last step in solving PR43953:
https://bugs.llvm.org/show_bug.cgi?id=43953

We started emitting reduction intrinsics with:
D80867/ rGe50059f6b6b3
So it's a relatively easy pattern match now to re-order those ops.
Also, I have not seen any complaints for the switch to intrinsics
yet, so I'll propose to remove the "experimental" tag from the
intrinsics soon.

Differential Revision: https://reviews.llvm.org/D81491

b5fb2695

[InstCombine] allow undef elements when comparing vector constants for min/max bailout · aeb50448

Sanjay Patel authored Jun 14, 2020

This is a hacky, but low-risk fix to avoid the infinite loop in PR46271:
https://bugs.llvm.org/show_bug.cgi?id=46271

As discussed there, the problem is that FoldOpIntoSelect() can get into a conflict
with a transform that wants to pull a 'not' op through min/max via
SimplifyDemandedVectorElts(). We need to relax our matching of min/max to include
undefined elements in vector constants to avoid that. Alternatively, we could
improve or cripple the demanded elements analysis, but that could create even
more problems.

The likely better, safer alternative will be to create min/max intrinsics, so
we can remove all of the hacks related to min/max matching in instcombine.

Differential Revision: https://reviews.llvm.org/D81698

aeb50448