Commits · beea06c10642e99051a472a534ac6ede02264e41 · Lorenzo Albano / LLVM bpEVL

Mar 15, 2021
- [NFC][Inliner] Debugging support to print funtion size after each inlining. · beea06c1
  Hongtao Yu authored Mar 11, 2021
```
Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D98439
```
  beea06c1
Mar 14, 2021

[ArgPromotion] Copy additional metadata for loads. · 166620a4

Chenguang Wang authored Mar 14, 2021

Current ArgPromotion implementation does not copy it: https://godbolt.org/z/zzTKof

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D93927

166620a4

[InstCombine] Restrict a GEP transform to avoid changing provenance · 7d7001b2

Simonas Kazlauskas authored Mar 13, 2021

This is an alternative to D98120. Herein, instead of deleting the transformation entirely, we check
that the underlying objects are both the same and therefore this transformation wouldn't incur a
provenance change, if applied.

https://alive2.llvm.org/ce/z/SYF_yv

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98588

7d7001b2

[X86][AMX] Prevent transforming load pointer from <256 x i32>* to x86_amx*. · 66fbf5fa

Luo, Yuanke authored Mar 09, 2021

The load/store instruction will be transformed to amx intrinsics
in the pass of AMX type lowering. Prohibiting the pointer cast
make that pass happy.

Differential Revision: https://reviews.llvm.org/D98247

66fbf5fa

Mar 13, 2021

[MemCpyOpt] Handle read from lifetime.start with offset · 55566609

Nikita Popov authored Mar 13, 2021

This fixes a regression from the MemDep-based implementation:
MemDep completely ignores lifetime.start intrinsics that aren't
MustAlias -- this is probably unsound, but it does mean that the
MemDep based implementation successfully eliminated memcpy's from
lifetime.start if the memcpy happens at an offset, rather than
the base address of the alloca.

Add a special case for the case where the lifetime.start spans the
whole alloca (which is pretty much the only kind of lifetime.start
that frontends ever emit), as we don't need to figure out our exact
aliasing relationship in that case, the whole alloca is dead prior
to the call.

If this doesn't cover all practically relevant cases, then it
would be possible to make use of the recently added PartialAlias
clobber offsets to make this more precise.

55566609

[InstCombine] avoid creating an extra instruction in zext fold and possible inf-loop · 4224a369

Sanjay Patel authored Mar 13, 2021

The structure of this fold is suspect vs. most of instcombine
because it creates instructions and tries to delete them
immediately after.

If we don't have the operand types for the icmps, then we are
not behaving as assumed. And as shown in PR49475, we can inf-loop.

4224a369

[LSR] Don't try to fixup uses in 'EH pad' instructions · 6e9b9978

Roman Lebedev authored Mar 13, 2021

The added test case crashes before this fix:
```
opt: /repositories/llvm-project/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp:5172: BasicBlock::iterator (anonymous namespace)::LSRInstance::AdjustInsertPositionForExpand(BasicBlock::iterator, const (anonymous namespace)::LSRFixup &, const (anonymous namespace)::LSRUse &, llvm::SCEVExpander &) const: Assertion `!isa<PHINode>(LowestIP) && !LowestIP->isEHPad() && !isa<DbgInfoIntrinsic>(LowestIP) && "Insertion point must be a normal instruction"' failed.
```
This is fully analogous to the previous commit,
with the pointer constant replaced to be something non-null.

The comparison here can be strength-reduced,
but the second operand of the comparison happens to be identical
to the constant pointer in the `catch` case of `landingpad`.

While LSRInstance::CollectLoopInvariantFixupsAndFormulae()
already gave up on uses in blocks ending up with EH pads,
it didn't consider this case.

Eventually, `LSRInstance::AdjustInsertPositionForExpand()`
will be called, but the original insertion point it will get
is the user instruction itself, and it doesn't want to
deal with EH pads, and asserts as much.

It would seem that this basically never happens in-the-wild,
otherwise it would have been reported already,
so it seems safe to take the cautious approach,
and just not deal with such users.

6e9b9978

[MemCpyOpt] Use AA to check for MustAlias between memset and memcpy · 2902bdee

Nikita Popov authored Mar 13, 2021

Rather than checking for simple equality, check for MustAlias, as
we do in other transforms. This catches equivalent GEPs.

2902bdee

[MemCpyOpt] Don't generate zero-size memset · 9080444f

Nikita Popov authored Mar 13, 2021

If a memset destination is overwritten by a memcpy and the sizes
are exactly the same, then the memset is simply dead. We can
directly drop it, instead of replacing it with a memset of zero
size, which is particularly ugly for the case of a dynamic size.

9080444f

Mar 12, 2021

[IndirectCallPromotion] Recommit "Don't strip ".__uniq." suffix when it strips · ef9d7db7

Wei Mi authored Mar 12, 2021

".llvm." suffix".

The recommit fixed a bug that symbols with "." at the beginning is not
properly handled in the last commit.

Original commit message:
Currently IndirectCallPromotion simply strip everything after the first "."
in LTO mode, in order to match the symbol name and the name with ".llvm."
suffix in the value profile. However, if -funique-internal-linkage-names
and thinlto are both enabled, the name may have both ".__uniq." suffix and
".llvm." suffix, and the current mechanism will strip them both, which is
unexpected. The patch fixes the problem.

Differential Revision: https://reviews.llvm.org/D98389

ef9d7db7

[OpaquePtrs] Remove some uses of type-less CreateGEP() (NFC) · 42eb658f

Nikita Popov authored Mar 11, 2021

This removes some (but not all) uses of type-less CreateGEP()
and CreateInBoundsGEP() APIs, which are incompatible with opaque
pointers.

There are a still a number of tricky uses left, as well as many
more variation APIs for CreateGEP.

42eb658f

[LV] Account IV recipes being uniform in VPTransformState::get(). · fb3ca707

Florian Hahn authored Mar 12, 2021

This patch fixes a crash when trying to get a scalar value using
VPTransformState::get() for uniform induction values or truncated
induction values. IVs and truncated IVs can be uniform and the updated
code accounts for that, fixing the crash.

This should fix
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=31981

fb3ca707

[SimplifyCFG] avoid sinking insts within an infinite-loop · bd197ed0

Sanjay Patel authored Mar 12, 2021

The test is reduced from a C source example in:
https://llvm.org/PR49541

It's possible that the test could be reduced further or
the predicate generalized further, but it seems to require
a few ingredients (including the "late" SimplifyCFG options
on the RUN line) to fall into the infinite-loop trap.

bd197ed0

Revert "[InstrProfiling] Don't generate __llvm_profile_runtime_user" · f50aef74

Hans Wennborg authored Mar 12, 2021

This broke the check-profile tests on Mac, see comment on the code
review.

> This is no longer needed, we can add __llvm_profile_runtime directly
> to llvm.compiler.used or llvm.used to achieve the same effect.
>
> Differential Revision: https://reviews.llvm.org/D98325

This reverts commit c7712087.

Also reverting the dependent follow-up commit:

Revert "[InstrProfiling] Generate runtime hook for ELF platforms"

> When using -fprofile-list to selectively apply instrumentation only
> to certain files or functions, we may end up with a binary that doesn't
> have any counters in the case where no files were selected. However,
> because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the
> runtime would still be pulled in and incur some non-trivial overhead,
> especially in the case when the continuous or runtime counter relocation
> mode is being used. A better way would be to pull in the profile runtime
> only when needed by declaring the __llvm_profile_runtime symbol in the
> translation unit only when needed.
>
> This approach was already used prior to 9a041a75, but we changed it
> to always generate the __llvm_profile_runtime due to a TAPI limitation.
> Since TAPI is only used on Mach-O platforms, we could use the early
> emission of __llvm_profile_runtime there, and on other platforms we
> could change back to the earlier approach where the symbol is generated
> later only when needed. We can stop passing -u__llvm_profile_runtime to
> the linker on Linux and Fuchsia since the generated undefined symbol in
> each translation unit that needed it serves the same purpose.
>
> Differential Revision: https://reviews.llvm.org/D98061

This reverts commit 87fd09b2.

f50aef74

Revert "Mark gc.relocate and gc.result as readnone" · cfe8f8e0

Serguei Katkov authored Mar 12, 2021

As readnone function they become movable and LICM can hoist them
out of a loop. As a result in LCSSA form phi node of type token
is created. No one is ready that GCRelocate first operand is phi node
but expects to be token.

GVN test were also updated, it seems it does not do what is expected.
Test for LICM is also added.

This reverts commit f352463a.

cfe8f8e0

[Attributor] Derive `willreturn` based on `mustprogress` · ff256c13

Johannes Doerfert authored Jan 05, 2021

Since D86233 we have `mustprogress` which, in combination with
`readonly`, implies `willreturn`. The idea is that every side-effect
has to be modeled as a "write". Consequently, `readonly` means there
is no side-effect, and `mustprogress` guarantees that we cannot "loop"
forever without side-effect.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D94125

ff256c13

Mar 11, 2021

[Attributor] Don't access pointer elem type in constructPointer (NFC) · 2fe85dd2

Nikita Popov authored Mar 11, 2021

Splitting this out as the change is non-trivial: The way this code
handled pointer types doesn't really make sense, as GEPs can only
apply an offset to the outermost pointer, but can't drill down
into interior pointer types (which would require dereferencing
memory).

Instead give special treatment to the first (pointer) index.
I've hardcoded it to zero as that's the only way the function is
used right now, but handling non-zero indexes would be
straightforward.

The original goal here was to have an element type for CreateGEP.

2fe85dd2

[InstrProfiling] Generate runtime hook for ELF platforms · 87fd09b2

Petr Hosek authored Mar 05, 2021

When using -fprofile-list to selectively apply instrumentation only
to certain files or functions, we may end up with a binary that doesn't
have any counters in the case where no files were selected. However,
because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the
runtime would still be pulled in and incur some non-trivial overhead,
especially in the case when the continuous or runtime counter relocation
mode is being used. A better way would be to pull in the profile runtime
only when needed by declaring the __llvm_profile_runtime symbol in the
translation unit only when needed.

This approach was already used prior to 9a041a75, but we changed it
to always generate the __llvm_profile_runtime due to a TAPI limitation.
Since TAPI is only used on Mach-O platforms, we could use the early
emission of __llvm_profile_runtime there, and on other platforms we
could change back to the earlier approach where the symbol is generated
later only when needed. We can stop passing -u__llvm_profile_runtime to
the linker on Linux and Fuchsia since the generated undefined symbol in
each translation unit that needed it serves the same purpose.

Differential Revision: https://reviews.llvm.org/D98061

87fd09b2

[SLP] Fix crash when matching associative reduction for integer min/max. · 73f94969

Valery N Dmitriev authored Mar 11, 2021

Associative reduction matcher in SLP begins with select instruction but when
it reached call to llvm.umax (or alike) via def-use chain the latter also matched
as UMax kind. The routine's later code assumes matched instruction to be a select
and thus it merely died on the first encountered cast that did not fit.

Differential Revision: https://reviews.llvm.org/D98432

73f94969

[SamplePGO] Skip inlinee profile scaling for sample loader inlining · 051f2c14

Wenlei He authored Mar 07, 2021

For CGSCC inline, we need to scale down a function's branch weights and entry counts when thee it's inlined at a callsite. This is done through updateCallProfile. Additionally, we also scale the weigths for the inlined clone based on call site count in updateCallerBFI. Neither is needed for inlining during sample profile loader as it's using context profile that is separated from inlinee's own profile. This change skip the inlinee profile scaling for sample loader inlining.

Differential Revision: https://reviews.llvm.org/D98187

051f2c14

[PGO] Fix two issues in PGOMemOPSizeOpt. · 365b225d

Hiroshi Yamauchi authored Feb 26, 2021

1. PGOMemOPSizeOpt grabs only the first, up to five (by default) entries from
the value profile metadata and preserves the remaining entries for the fallback
memop call site. If there are more than five entries, the rest of the entries
would get dropped. This is fine for PGOMemOPSizeOpt itself as it only promotes
up to 3 (by default) values, but potentially not for other downstream passes
that may use the value profile metadata.

2. PGOMemOPSizeOpt originally assumed that only values 0 through 8 are kept
track of. When the range buckets were introduced, it was changed to skip the
range buckets, but since it does not grab all entries (only five), if some range
buckets exist in the first five entries, it could potentially cause fewer
promotion opportunities (eg. if 4 out of 5 were range buckets, it may be able to
promote up to one non-range bucket, as opposed to 3.) Also, combined with 1, it
means that wrong entries may be preserved, as it didn't correctly keep track of
which were entries were skipped.

To fix this, PGOMemOPSizeOpt now grabs all the entries (up to the maximum number
of value profile buckets), keeps track of which entries were skipped, and
preserves all the remaining entries.

Differential Revision: https://reviews.llvm.org/D97592

365b225d

Revert "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs... · f40976bd

Stephen Tozer authored Mar 11, 2021

Revert "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"

This reverts commit c0f3dfb9.

Reverted due to an error on the clang-x64-windows-msvc buildbot.

f40976bd

[OpaquePtrs] Remove some uses of type-less CreateLoad APIs (NFC) · 46354bac

Nikita Popov authored Feb 12, 2021

Explicitly pass loaded type when creating loads, in preparation
for the deprecation of these APIs.

There are still a couple of uses left.

46354bac

[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands · c0f3dfb9

gbtozers authored Sep 30, 2020

This patch improves salvageDebugInfoImpl by allowing it to salvage arithmetic
operations with two or more non-const operands; this includes the GetElementPtr
instruction, and most Binary Operator instructions. These salvages produce
DIArgList locations and are only valid for dbg.values, as currently variadic
DIExpressions must use DW_OP_stack_value. This functionality is also only added
for salvageDebugInfoForDbgValues; other functions that directly call
salvageDebugInfoImpl (such as in ISel or Coroutine frame building) can be
updated in a later patch.

Differential Revision: https://reviews.llvm.org/D91722

c0f3dfb9

Reapply [LICM] Make promotion faster · 403da6a6

Nikita Popov authored Mar 09, 2021

Relative to the previous implementation, this always uses
aliasesUnknownInst() instead of aliasesPointer() to correctly
handle atomics. The added test case was previously miscompiled.

-----

Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.

The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.

If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.

This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.

This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.

Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.

Differential Revision: https://reviews.llvm.org/D89264

403da6a6

[Debugify][OriginalDIMode] Export the report into JSON file · 9f41c03f

Djordje Todorovic authored Feb 22, 2021

By using the original-di check with debugify in the combination with
the llvm/utils/llvm-original-di-preservation.py it becomes very user
friendly tool. An example of the HTML page with the issues
related to debug info can be found at [0].

[0] https://djolertrk.github.io/di-checker-html-report-example/

Differential Revision: https://reviews.llvm.org/D82546

9f41c03f

[InstrProfiling] Don't generate __llvm_profile_runtime_user · c7712087

Petr Hosek authored Mar 10, 2021

This is no longer needed, we can add __llvm_profile_runtime directly
to llvm.compiler.used or llvm.used to achieve the same effect.

Differential Revision: https://reviews.llvm.org/D98325

c7712087

[ValueMapper] Add debug output for metadata remapping · 8b7d3bed

Ruiling Song authored Feb 01, 2021

This is useful for debugging which pointers are updated during remapping
process.

Differential Revision: https://reviews.llvm.org/D95775

8b7d3bed

Mar 10, 2021

[Attributor] Attributor call site specific AAValueConstantRange · d75c9e61

kuterd authored Jan 24, 2021

This patch makes uses of the context bridges introduced in D83299 to make
AAValueConstantRange call site specific.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83744

d75c9e61

[VPlan] Support to widen select intructions in VPlan native path · 0de8aeae

Mauri Mustonen authored Mar 10, 2021

Add support to widen select instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer.

Previously select instructions get handled by the wrong recipe and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D97136

0de8aeae

[DSE] Extending isOverwrite to support offsetted fully overlapping stores · 989051d5

Matteo Favaro authored Mar 10, 2021

The isOverwrite function is making sure to identify if two stores
are fully overlapping and ideally we would like to identify all the
instances of OW_Complete as they'll yield possibly killable stores.
The current implementation is incapable of spotting instances where
the earlier store is offsetted compared to the later store, but
still fully overlapped. The limitation seems to lie on the
computation of the base pointers with the
GetPointerBaseWithConstantOffset API that often yields different
base pointers even if the stores are guaranteed to partially overlap
(e.g. the alias analysis is returning AliasResult::PartialAlias).

The patch relies on the offsets computed and cached by BatchAAResults
(available after D93529) to determine if the offsetted overlapping
is OW_Complete.

Differential Revision: https://reviews.llvm.org/D97676

989051d5

Remove original implementation of UniqueInternalLinkageNames pass. · 0ba1ebcb

Sriraman Tallam authored Mar 08, 2021

D96109 was recently submitted which contains the refactored implementation of
-funique-internal-linakge-names by adding the unique suffixes in clang rather
than as an LLVM pass. Deleting the former implementation in this change.

Differential Revision: https://reviews.llvm.org/D98234

0ba1ebcb

[DebugInfo][NFC] Refactor BinOp+GEP salvaging in salvageDebugInfoImpl · 81b8357e

gbtozers authored Dec 08, 2020

This patch refactors out the salvaging of GEP and BinOp instructions into
separate functions, in preparation for further changes to the salvaging of these
instructions coming in another patch; there should be no functional change as a
result of this refactor.

Differential Revision: https://reviews.llvm.org/D92851

81b8357e

[InstCombine][SimplifyLibCalls] An extra sqrtf was produced because of... · 7c49f3c7

Daniil Seredkin authored Mar 10, 2021

[InstCombine][SimplifyLibCalls] An extra sqrtf was produced because of transformations in optimizePow function

See: https://bugs.llvm.org/show_bug.cgi?id=47613

There was an extra sqrt call because shrinking emitted a new powf and at the same time optimizePow replaces the previous pow with sqrt and as the result we have two instructions that will be in worklist of InstCombie despite the fact that %powf is not used by anyone (it is alive because of errno).

As the result we have two instructions:

  %powf = call fast float @powf(float %x, float 5.000000e-01)
  %sqrt = call fast double @sqrt(double %dx)

%powf will be converted to %sqrtf on a later iteration.

As a quick fix for that I moved shrinking to the end of optimizePow so that pow is replaced with sqrt at first that allows not to emit a new shrunk powf.

Differential Revision: https://reviews.llvm.org/D98235

7c49f3c7

Revert "[LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest`" · 7ff2768b
Ta-Wei Tu authored Mar 11, 2021
```
This reverts commit df9158c9.
```
7ff2768b

[dfsan] Tracking origins at phi nodes · 6a9a686c

Jianzhou Zhao authored Mar 09, 2021

This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D98268

6a9a686c

[DSE] Handle memmove with equal non-const sizes · c68b560b

Dávid Bolvanský authored Mar 10, 2021

Follow up for fhahn's D98284. Also fixes a case from PR47644.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D98346

c68b560b

[DSE] Handle memcpy/memset with equal non-const sizes. · 8d9b9c0e

Florian Hahn authored Mar 10, 2021

Currently DSE misses cases where the size is a non-const IR value, even
if they match. For example, this means that llvm.memcpy/llvm.memset
calls are not eliminated, even if they write the same number of bytes.

This patch extends isOverwite to try to get IR values for the number of
bytes written from the analyzed instructions. If the values match,
alias checks are performed and the result is returned.

At the moment this only covers llvm.memcpy/llvm.memset. In the future,
we may enable MemoryLocation to also track variable sizes, but this
simple approach should allow us to cover the important cases in DSE.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98284

8d9b9c0e

[SampleFDO] Support enabling -funique-internal-linkage-name. · ee35784a

Wei Mi authored Jan 19, 2021

now -funique-internal-linkage-name flag is available, and we want to flip
it on by default since it is beneficial to have separate sample profiles
for different internal symbols with the same name. As a preparation, we
want to avoid regression caused by the flip.

When we flip -funique-internal-linkage-name on, the profile is collected
from binary built without -funique-internal-linkage-name so it has no uniq
suffix, but the IR in the optimized build contains the suffix. This kind of
mismatch may introduce transient regression.

To avoid such mismatch, we introduce a NameTable section flag indicating
whether there is any name in the profile containing uniq suffix. Compiler
will decide whether to keep uniq suffix during name canonicalization
depending on the NameTable section flag. The flag is only available for
extbinary format. For other formats, by default compiler will keep uniq
suffix so they will only experience transient regression when
-funique-internal-linkage-name is just flipped.

Another type of regression is caused by places where we miss to call
getCanonicalFnName. Those places are fixed.

Differential Revision: https://reviews.llvm.org/D96932

ee35784a

[rs4gc] common bdv operand visitation [nfc] · cf1899e0
Philip Reames authored Mar 09, 2021

cf1899e0