Commits · 422fc5603ab5a93a814d9652201e4582f18f8136 · Lorenzo Albano / LLVM bpEVL

Aug 12, 2021

[llvm][Inline] Refactor out InlineOrder · 422fc560

Liqiang Tao authored Aug 12, 2021

Move InlineOrder to separated file.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D107831

422fc560

Revert "[Matrix] Overload stride arg in matrix.columnwise.load/store." · 28c04794
Mehdi Amini authored Aug 12, 2021
```
This reverts commit a1ef81de.

Broke the MLIR buildbot.
```
28c04794

[Matrix] Overload stride arg in matrix.columnwise.load/store. · a1ef81de

Florian Hahn authored Aug 12, 2021

This patch adjusts the intrinsics definition of
llvm.matrix.column.major.load and llvm.matrix.column.major.store to
allow overloading the type of the stride. The bitwidth of the stride is
used to perform the offset computation.

This fixes a crash when using __builtin_matrix_column_major_load or
__builtin_matrix_column_major_store on 32 bit platforms. The stride argument
of the builtins are defined as `size_t`, which is 32 bits wide on 32 bit
platforms.

Note that we still perform offset computations with 64 bit width on 32
bit platforms for accesses that do not take a user-specified stride.
This can be fixed separately.

Fixes PR51304.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D107349

a1ef81de

Reapply "SROA: Enhance speculateSelectInstLoads" · 5d940b71

Christudasan Devadasan authored Aug 09, 2021

Originally committed as ffc3fb66
Reverted in fcf2d5f4 due to an
assertion failure.

Original commit message:

Allow the folding even if there is an
intervening bitcast.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106667

5d940b71

Aug 11, 2021

[ObjC][ARC] Don't form a StoreStrong call if it is unsafe to move the · 643ce61f

Akira Hatanaka authored Aug 11, 2021

release call

findSafeStoreForStoreStrongContraction checks whether it's safe to move
the release call to the store by inspecting all instructions between the
two, but was ignoring retain instructions. This was causing objects to
be released and deallocated before they were retained.

rdar://81668577

643ce61f

[InstCombine] avoid breaking up min/max (cmp+sel) idioms · a0a9c9e1

Sanjay Patel authored Aug 11, 2021

This is a quick fix for a motivating case that looks like this:
https://godbolt.org/z/GeMqzMc38

As noted, we might be able to restore the min/max patterns
with select folds, or we just wait for this to become easier
with canonicalization to min/max intrinsics.

a0a9c9e1

[LTO][lld] Add lto-pgo-warn-mismatch option · 8fa16cc6

Yolanda Chen authored Aug 11, 2021

When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX)
due to source changes (e.g. `#if` code runs for profile generation but not for profile use)
To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option.
In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option.

Add "lto-pgo-warn-mismatch" option to lld COFF/ELF to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO.

Differential Revision: https://reviews.llvm.org/D104431

8fa16cc6

Revert "[lld] Add lto-pgo-warn-mismatch option" · 6c480982
Wang, Pengfei authored Aug 11, 2021
```
This reverts commit 0cfb00a1.
```
6c480982

[lld] Add lto-pgo-warn-mismatch option · 0cfb00a1

Yolanda Chen authored Aug 11, 2021

When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX).
To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option.
In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option.

Add this "lto-pgo-warn-mismatch" option to lld to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D104431

0cfb00a1

[InstrProfiling] Generate runtime hook for Fuchsia · 389dc94d

Petr Hosek authored Aug 10, 2021

When none of the translation units in the binary have been instrumented
we shouldn't need to link the profile runtime. However, because we pass
-u__llvm_profile_runtime on Linux and Fuchsia, the runtime would still
be pulled in and incur some overhead. On Fuchsia which uses runtime
counter relocation, it also means that we cannot reference the bias
variable unconditionally.

This change modifies the InstrProfiling pass to pull in the profile
runtime only when needed by declaring the __llvm_profile_runtime symbol
in the translation unit only when needed. For now we restrict this only
for Fuchsia, but this can be later expanded to other platforms. This
approach was already used prior to 9a041a75, but we changed it
to always generate the __llvm_profile_runtime due to a TAPI limitation,
but that limitation may no longer apply, and it certainly doesn't apply
on platforms like Fuchsia.

Differential Revision: https://reviews.llvm.org/D98061

389dc94d

Revert "[InstrProfiling] Emit bias variable eagerly" · c0c1c3cf
Petr Hosek authored Aug 10, 2021
```
This reverts commit 6660cec5 since
it was superseded by https://reviews.llvm.org/D98061.
```
c0c1c3cf

[Attributor][NFC] Try to make the windows build bots happy · fc32a5c8

Johannes Doerfert authored Aug 11, 2021

Failed for some reason, potentially because of the inner type
declaration in combination with the `using`. This might help.

Failure:
https://lab.llvm.org/buildbot/#/builders/127/builds/15432

fc32a5c8

[Attributor][FIX] Handle recurrences (PHIs) in AAPointerInfo explicitly · e7e3585c

Johannes Doerfert authored Aug 10, 2021

PHI nodes are not pass through but change their value, we have to
account for that to avoid missing stores.

Follow up for D107798 to fix PR51249 for good.

Differential Revision: https://reviews.llvm.org/D107808

e7e3585c

[Attributor][FIX] Only avoid visiting PHI uses multiple times (PR51249) · 96da6dd6

Johannes Doerfert authored Aug 09, 2021

AAPointerInfoFloating needs to visit all uses and some multiple times if
we go through PHI nodes. Attributor::checkForAllUses keeps a visited set
so we don't recurs endlessly. We now allow recursion for non-phi uses so
we track all pointer offsets via PHI nodes properly without endless
recursion.

This replaces the first attempt D107579.

Differential Revision: https://reviews.llvm.org/D107798

96da6dd6

[OpenMP][FIX] Disabled optimizations have to be made known · e0c5d83a

Johannes Doerfert authored Aug 10, 2021

To avoid simplification with wrong constants we need to make sure we
know that we won't perform specific optimizations based on the users
request. The non-SPMDzation and non-CustomStateMachine flags did only
prevent the final transformation but allowed to value simplification
to go ahead.

Differential Revision: https://reviews.llvm.org/D107862

e0c5d83a

[llvm][clang][NFC] updates inline licence info · c874dd53

Christopher Di Bella authored Aug 05, 2021

Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.

Differential Revision: https://reviews.llvm.org/D107528

c874dd53

Simplify coro::salvageDebugInfo() (NFC-ish) · a353edb8

Adrian Prantl authored Aug 10, 2021

This patch removes the hand-rolled implementation of salvageDebugInfo
for cast and GEPs and replaces it with a call into
llvm::salvageDebugInfoImpl().

A side-effect of this is that additional redundant convert operations
are introduced, but those don't have any negative effect on the
resulting DWARF expression.

rdar://80227769

Differential Revision: https://reviews.llvm.org/D107384

a353edb8

Streamline the API of salvageDebugInfoImpl (NFC) · d6b68801

Adrian Prantl authored Aug 02, 2021

This patch refactors / simplifies salvageDebugInfoImpl(). The goal
here is to simplify the implementation of coro::salvageDebugInfo() in
a followup patch.

  1. Change the return value to I.getOperand(0). Currently users of
     salvageDebugInfoImpl() assume that the first operand is
     I.getOperand(0). This patch makes this information explicit. A
     nice side-effect of this change is that it allows us to salvage
     expressions such as add i8 1, %a in the future.

  2. Factor out the creation of a DIExpression and return an array of
     DIExpression operations instead. This change allows users that
     call salvageDebugInfoImpl() in a loop to avoid the costly
     creation of temporary DIExpressions and to defer the creation of
     a DIExpression until the end.

This patch does not change any functionality.

rdar://80227769

Differential Revision: https://reviews.llvm.org/D107383

d6b68801

Aug 10, 2021

[MemCpyOpt] Optimize MemoryDef insertion · 17db125b

Nikita Popov authored Aug 07, 2021

When converting a store into a memset, we currently insert the new
MemoryDef after the store MemoryDef, which requires all uses to be
renamed to the new def using a whole block scan. Instead, we can
insert the new MemoryDef before the store and not rename uses,
because we know that the location is immediately overwritten, so
all uses should still refer to the old MemoryDef. Those uses will
get renamed when the old MemoryDef is actually dropped, which is
efficient.

I expect something similar can be done for some of the other MSSA
updates in MemCpyOpt. This is an alternative to D107513, at least
for this particular case.

Differential Revision: https://reviews.llvm.org/D107702

17db125b

[InstCombine] avoid infinite loops from min/max canonicalization · b267d3ce

Sanjay Patel authored Aug 10, 2021

The intrinsics have an extra chunk of known bits logic
compared to the normal cmp+select idiom. That allows
folding the icmp in each case to something better, but
that then opposes the canonical form of min/max that
we try to form for a select.

I'm carving out a narrow exception to preserve all
existing regression tests while avoiding the inf-loop.
It seems unlikely that this is the only bug like this
left, but this should fix:
https://llvm.org/PR51419

b267d3ce

[SimpifyCFG] Remove recursion from FoldCondBranchOnPHI. NFCI. · a1783b54

Carl Ritson authored Aug 10, 2021

Avoid stack overflow errors on systems with small stack sizes
by removing recursion in FoldCondBranchOnPHI.

This is a simple change as the recursion was only iteratively
calling the function again on the same arguments.
Ideally this would be compiled to a tail call, but there is
no guarantee.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107803

a1783b54

[InstCombine] Add more complex folds for extractelement + stepvector · ce394161

David Sherwood authored Jul 19, 2021

I have updated cheapToScalarize to also consider the case when
extracting lanes from a stepvector intrinsic. This required removing
the existing 'bool IsConstantExtractIndex' and passing in the actual
index as a Value instead. We do this because we need to know if the
index is <= known minimum number of elements returned by the stepvector
intrinsic. Effectively, when extracting lane X from a stepvector we
know the value returned is also X.

New tests added here:

  Transforms/InstCombine/vscale_extractelement.ll

Differential Revision: https://reviews.llvm.org/D106358

ce394161

Aug 09, 2021

[coro] Correct CurrentBlock tracking bug recently introduced · b987c283

Arnold Schwaighofer authored Aug 04, 2021

We use the CurrentBlock to determine whether we have already processed a
block. Don't reuse this variable for setting where we should insert the
rematerialization. The rematerialization block is different to the
current block when we rematerialize for coro suspend block users.

Differential Revision: https://reviews.llvm.org/D107573

b987c283

Revert "SROA: Enhance speculateSelectInstLoads" · fcf2d5f4
Christudasan Devadasan authored Aug 09, 2021
```
This reverts commit ffc3fb66.
```
fcf2d5f4
[LowerMemIntrinsics] Typo fix. · b5e470aa
Michael Liao authored Aug 08, 2021

b5e470aa

Aug 08, 2021

[LV] Support Interleaved Store Group With Gaps · 67278b8a

Dorit Nuzman authored Jun 17, 2021

Teach LV to use masked-store to support interleave-store-group with
gaps (instead of scatters/scalarization).

The symmetric case of using masked-load to support
interleaved-load-group with gaps was introduced a while ago, by
https://reviews.llvm.org/D53668; This patch completes the store-scenario
leftover from D53668, and solves PR50566.

Reviewed by: Ayal Zaks

Differential Revision: https://reviews.llvm.org/D104750

67278b8a

Aug 07, 2021

[MemCpyOpt] Remove MemDepAnalysis-based implementation · 88003cea

Nikita Popov authored May 08, 2021

The MemorySSA-based implementation has been enabled for a few months
(since D94376). This patch drops the old MDA-based implementation
entirely.

I've kept this to only the basic cleanup of dropping various
conditions -- the code could be further cleaned up now that there
is only one implementation.

Differential Revision: https://reviews.llvm.org/D102113

88003cea

[InstCombine] Remove nnan requirement for transformation to fabs from select · a9a176ca

Krishna authored Aug 07, 2021

In this patch, the "nnan" requirement is removed for the canonicalization of select with fcmp to fabs.
(i) FSub logic: Remove check for nnan flag presence in fsub. Example: https://alive2.llvm.org/ce/z/751svg (fsub).
(ii) FNeg logic: Remove check for the presence of nnan and nsz flag in fneg. Example: https://alive2.llvm.org/ce/z/a_fsdp (fneg).

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D106872

a9a176ca

[NFC][InstCombine] `vector_reduce_xor(?ext(<n x i1>))` --> `?ext(vector_reduce_add(<n x i1>))` · 0a241e90

Roman Lebedev authored Aug 07, 2021

Instead of expanding it ourselves,
we can just forward to `?ext(vector_reduce_add(<n x i1>))`, as per alive2:
https://alive2.llvm.org/ce/z/ymz7zE (self)
https://alive2.llvm.org/ce/z/eKu2v2 (skipped zext)
https://alive2.llvm.org/ce/z/c3BXgc (skipped sext)

0a241e90

[NFC][InstCombine] Simplify emitted IR for `vector_reduce_xor(?ext(<n x i1>))` · c6ff867f

Roman Lebedev authored Aug 07, 2021

Now that we canonicalize low bit splatting to the form we were emitting
here ourselves, emit simpler IR that will be canonicalized later.

See 1e801439 for proofs:
https://alive2.llvm.org/ce/z/MjCm5W (self)
https://alive2.llvm.org/ce/z/kgqF4M (skipped zext)
https://alive2.llvm.org/ce/z/pgy3HP (skipped sext)

c6ff867f

[InstCombine] Prefer `-(x & 1)` as the low bit splatting pattern (PR51305) · e7187051

Roman Lebedev authored Aug 07, 2021

Both patterns are equivalent (https://alive2.llvm.org/ce/z/jfCViF),
so we should have a preference. It seems like mask+negation is better
than two shifts.

e7187051

SROA: Enhance speculateSelectInstLoads · ffc3fb66

Christudasan Devadasan authored Jul 23, 2021

Allow the folding even if there is an
intervening bitcast.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106667

ffc3fb66

[VPlan] Iterate over phi recipes to detect reductions to fix. · a00aafc3

Florian Hahn authored Aug 07, 2021

After refactoring the phi recipes, we can now iterate over all header
phis in a VPlan to detect reductions when it comes to fixing them up
when tail folding.

This reduces the coupling with the cost model & legal by using the
information directly available in VPlan. It also removes a call to
getOrAddVPValue, which references the original IR value which may
become outdated after VPlan transformations.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100102

a00aafc3

Aug 06, 2021

[InstCombine] reduce vector casting before icmp · 0369714b

Sanjay Patel authored Aug 06, 2021

There may be some generalizations (see test comments) of these patterns,
but this should handle the cases motivated by:
https://llvm.org/PR51315
https://llvm.org/PR51259

The backend may want to transform differently, but at least for
the x86 examples that I looked at, there does not appear to be
any significant perf diff either way.

0369714b

[CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA. · 6a9cf21f

Artem Belevich authored Jul 20, 2021

Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that
there are users that do not expect LLVM to materialize `memset` intrinsic.

While other passes can do that, too, MemCpyOpt triggers it more frequently and
breaks sanitizers and some downstream users.

For now introduce a flag to force-enable the flag and opt-in only CUDA
compilation with NVPTX back-end.

Differential Revision: https://reviews.llvm.org/D106401

6a9cf21f

[MemCpyOpt] Teach memcpyopt to handle loads from the constant memory. · d1cacd59

Michael Liao authored Aug 05, 2021

- Loads from the constant memory (either explicit one or as the source
  of memory transfer intrinsics) won't alias any stores.

Reviewed By: asbirlea, efriedma

Differential Revision: https://reviews.llvm.org/D107605

d1cacd59

[LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform · 3fd96e1b

David Sherwood authored Jul 30, 2021

This patch adds more instructions to the Uniforms list, for example certain
intrinsics that are uniform by definition or whose operands are loop invariant.
This list includes:

  1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which
  are always uniform by definition.
  2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have
  loop invariant input operands then these are also uniform too.

Also, in VPRecipeBuilder::handleReplication we check if an instruction is
uniform based purely on whether or not the instruction lives in the Uniforms
list. However, there are certain cases where calls to some intrinsics can
be effectively treated as uniform too. Therefore, we now also treat the
following cases as uniform for scalable vectors:

  1. If the 'assume' intrinsic's operand is not loop invariant, then we
  are free to treat this as uniform anyway since it's only a performance
  hint. We will get the benefit for the first lane.
  2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop
  variant then for scalable vectors we assume these still ultimately come
  from the broadcast of an alloca. We do not support scalable vectorisation
  of loops containing alloca instructions, hence the alloca itself would
  be invariant. If the pointer does not come from an alloca then the
  intrinsic itself has no effect.

I have updated the assume test for fixed width, since we now treat it
as uniform:

  Transforms/LoopVectorize/assume.ll

I've also added new scalable vectorisation tests for other intriniscs:

  Transforms/LoopVectorize/scalable-assume.ll
  Transforms/LoopVectorize/scalable-lifetime.ll
  Transforms/LoopVectorize/scalable-noalias-scope-decl.ll

Differential Revision: https://reviews.llvm.org/D107284

3fd96e1b

[FuncSpec] Return changed if function is changed by tryToReplaceWithConstant · 0fd03feb

Chuanqi Xu authored Aug 06, 2021

The may get changed before specialization by RunSCCPSolver. In other
words, the pass may change the function without specialization happens.
Add test and comment to reveal this.
And it may return No Changed if the function get changed by
RunSCCPSolver before the specialization. It looks like a potential bug.

Test Plan: check-all

Reviewed By: https://reviews.llvm.org/D107622

Differential Revision: https://reviews.llvm.org/D107622

0fd03feb

Revert "[LoopVectorize] Add support for replication of more intrinsics with scalable vectors" · 43a5c750
David Sherwood authored Aug 06, 2021
```
This reverts commit 95800da9.
```
43a5c750
[NFC] [FuncSpec] Remove unused variables in isArgumentInteresting · 62fc3e0a
Chuanqi Xu authored Aug 06, 2021

62fc3e0a