Commits · 671a87104b8132e45718a9584cbb7ed97916683f · Lorenzo Albano / LLVM bpEVL

Jun 19, 2021

[llvm][Inliner] Add an optional PriorityInlineOrder · 671a8710

Liqiang Tao authored Jun 19, 2021

This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D104028

671a8710

[InstCombine] Don't transform code if DoTransform is false · 575ba6f4

Guozhi Wei authored Jun 18, 2021

In patch https://reviews.llvm.org/D72396, it doesn't check DoTransform before transforming the code, and generates wrong result for the attached test case.

Differential Revision: https://reviews.llvm.org/D104567

575ba6f4

[InstrProfiling][ELF] Make __profd_ private if the function does not use value profiling · 3307240f

Fangrui Song authored Jun 18, 2021

On ELF, the D1003372 optimization can apply to more cases. There are two
prerequisites for making `__profd_` private:

* `__profc_` keeps `__profd_` live under compiler/linker GC
* `__profd_` is not referenced by code

The first is satisfied because all counters/data are in a section group (either
`comdat any` or `comdat noduplicates`). The second requires that the function
does not use value profiling.

Regarding the second point: `__profd_` may be referenced by other text sections
due to inlining. There will be a linker error if a prevailing text section
references the non-prevailing local symbol.

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 4.2% smaller (1-169620032/177066968).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 2.5% smaller.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D103717

3307240f

[CSSPGO] Undoing the concept of dangling pseudo probe · bd524955

Hongtao Yu authored Jun 17, 2021

As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.

I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104477

bd524955

Jun 18, 2021

[LoopUnroll] Simplify optimization remarks · 3308205a

Nikita Popov authored Jun 17, 2021

Remove dependence on ULO.TripCount/ULO.TripMultiple from ORE and
debug code. For debug code, print information about all exits.
For optimization remarks, only include the unroll count and the
type of unroll (complete, partial or runtime), but omit detailed
information about exit folding, now that more than one exit may
be folded.

Differential Revision: https://reviews.llvm.org/D104482

3308205a

[GCOVProfiling] don't profile Fn's w/ noprofile attribute · bef29928

Nick Desaulniers authored Jun 18, 2021

Similar to D104475, the Linux kernel would like to avoid compiler
generated code in certain functions. The no_profile function
attribute can be used in C to generate the the noprofile fn attr in IR.
Respect that from GCOVProfiling.

Link: https://lore.kernel.org/lkml/CAKwvOdmPTi93n2L0_yQkrzLdmpxzrOR7zggSzonyaw2PGshApw@mail.gmail.com/

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D104257

bef29928

[DFSan] Cleanup code for platforms other than Linux x86_64. · 14407332

Andrew Browne authored Jun 17, 2021

These other platforms are unsupported and untested.
They could be re-added later based on MSan code.

Reviewed By: gbalats, stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D104481

14407332

Revert D104028 "[llvm][Inliner] Add an optional PriorityInlineOrder" · 93183a41
Liqiang Tao authored Jun 18, 2021

93183a41

[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration (try 3) · de92287c

Max Kazantsev authored Jun 18, 2021

This patch handles one particular case of one-iteration loops for which SCEV
cannot straightforwardly prove BECount = 1. The idea of the optimization is to
symbolically execute conditional branches on the 1st iteration, moving in topoligical
order, and only visiting blocks that may be reached on the first iteration. If we find out
that we never reach header via the latch, then the backedge can be broken.

This implementation uses InstSimplify. SCEV version was rejected due to high
compile time impact.

Differential Revision: https://reviews.llvm.org/D102615
Reviewed By: nikic

de92287c

[Attributor] Fix UB behavior on uninitalized bool variables. · 3f5d53a5
Haojian Wu authored Jun 18, 2021
```
Found by ASAN.
```
3f5d53a5

[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X) · 6643e51d

Daniil Seredkin authored Jun 02, 2021

InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case.

Differential Revision: https://reviews.llvm.org/D104193

6643e51d

[llvm][Inliner] Add an optional PriorityInlineOrder · a740b707

Liqiang Tao authored Jun 18, 2021

This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D104028

a740b707

[NFC] Assert non-zero factor before division · fa5eb22a
Max Kazantsev authored Jun 18, 2021
```
This is to ensure that zero denominator leads to controlled
assertion failure rather than UB.
```
fa5eb22a

[Attributor] Don't print the call-graph in a hard-coded file. · 7670938b

Haojian Wu authored Jun 18, 2021

This looks like not a practical pattern in our codebase (it could fail
in some sandbox environement).

Instead we print it via standard output, and it is controled by the
-attributor-print-call-graph, this follows a similiar pattern of attributor-print-dep.

7670938b

Revert "[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X)" · 6de741de
Daniil Seredkin authored Jun 18, 2021
```
This reverts commit 31053338.
```
6de741de

[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X) · 31053338

Daniil Seredkin authored Jun 02, 2021

InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case.

Differential Revision: https://reviews.llvm.org/D104193

31053338

Revert D103717 "[InstrProfiling] Make __profd_ unconditionally private for ELF" · 5798be84

Fangrui Song authored Jun 17, 2021

This reverts commit 76d0747e.

If a group has `__llvm_prf_vals` due to static value profiler counters
(`NS!=0`), we cannot make `__llvm_prf_data` private, because a prevailing text
section may reference `__llvm_prf_data` and will cause a `relocation refers to a
discarded section` linker error.

Note: while a `__profc_` group is non-prevailing, it may be referenced by a
prevailing text section due to inlining.

```
group section [   66] `.group' [__profc__ZN5clang20EmitClangDeclContextERN4llvm12RecordKeeperERNS0_11raw_ostreamE] contains 4 sections:
   [Index]    Name
   [   67]   __llvm_prf_cnts
   [   68]   __llvm_prf_vals
   [   69]   __llvm_prf_data
   [   70]   .rela__llvm_prf_data
```

5798be84

[Attributor][FIX] Arguments of unknown functions can be undef · 30c9d68a

Johannes Doerfert authored May 16, 2021

This should fix PR50683. The wrong assumption was that we
could always know what the callee is when we replace a call site
argument with undef. We wanted to know that to remove the `noundef`
that might be attached to the argument. Since no callee means we
did the propagation on the caller site, there is no need to remove
an attribute. It is only needed if we replace all uses and therefore
pass `undef` instead of the value that was passed in otherwise.

30c9d68a

[Attributor] Use a centralized value simplification interface · 666dc6f1

Johannes Doerfert authored Jun 17, 2021

To allow outside AAs that simplify values we need to ensure all value
simplification goes through the Attributor, not AAValueSimplify (or any
of the other AAs we have already like AAPotentialValues). This patch
also introduces an interface for the outside AAs to register
simplification callbacks for an IRPosition. To make this work as
expected we have to pass IRPositions instead of Values in
AAValueSimplify, which makes sense by itself.

666dc6f1

[Attributor] Introduce a helper do deal with constant type mismatches · d9194b6e

Johannes Doerfert authored May 09, 2021

If we simplify values we sometimes end up with type mismatches. If the
value is a constant we can often cast it though to still allow
propagation. The logic is now put into a helper and it replaces some
ad hoc things we did before.

This also introduces the AA namespace for abstract attribute related
functions and types.

Differential Revision: https://reviews.llvm.org/D103856

d9194b6e

[Attributor] Make sure Heap2Stack works properly on a GPU target · 9959eee0

Johannes Doerfert authored Jun 17, 2021

If the target stack is not accessible between different running
"threads" we have to make sure not to create allocas for mallocs
that might be used by multiple "threads". The "use check" is
sufficient to prevent this but if we apply the "free check" we have
to make sure the pointer is not communicated to others before
the free is reached.

Differential Revision: https://reviews.llvm.org/D98608

9959eee0

[OpenMP][NFC] Expose AAExecutionDomain and rename its getter · 9a23e673

Johannes Doerfert authored May 11, 2021

The initial use for AAExecutionDomain was to determine if a single
thread executes a block. While this is sometimes informative most
of the time, and for other reasons, we actually want to know if it
is the "initial thread". Thus, the thread that started execution on
the current device. The deduction needs to be adjusted in a follow
up as the methods we use right not are looking for the OpenMP thread
id which is resets whenever a thread enters a parallel region. What
we basically want is to look for `llvm.nvvm.read.ptx.sreg.ntid.x` and
equivalent functions.

9a23e673

[Attributor][NFC] AAReachability is currently stateless, don't invalidate it · 8d7bace3

Johannes Doerfert authored May 06, 2021

We invalidated AAReachabilityImpl directly which is not helpful and
confusing as we still used it regardless. We now avoid invalidating it
(not needed anyway) and add checks for the state. This has by itself no
actual effect but prepares for later extensions.

8d7bace3

[dfsan] Replace dfs$ prefix with .dfsan suffix · c6b5a25e

George Balatsouras authored Jun 17, 2021

The current naming scheme adds the `dfs$` prefix to all
DFSan-instrumented functions.  This breaks mangling and prevents stack
trace printers and other tools from automatically demangling function
names.

This new naming scheme is mangling-compatible, with the `.dfsan`
suffix being a vendor-specific suffix:
https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-structure

With this fix, demangling utils would work out-of-the-box.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D104494

c6b5a25e

[Coroutine] Properly deal with byval and noalias parameters · 3522167e

Xun Li authored Jun 17, 2021

This patch is to address https://bugs.llvm.org/show_bug.cgi?id=48857.
Previous attempts can be found in D104007 and D101980.
A lot of discussions can be found in those two patches.
To summarize the bug:
When Clang emits IR for coroutines, the first thing it does is to make a copy of every argument to the local stack, so that uses of the arguments in the function will all refer to the local copies instead of the arguments directly.
However, in some cases we find that arguments are still directly used:
When Clang emits IR for a function that has pass-by-value arguments, sometimes it emits an argument with byval attribute. A byval attribute is considered to be local to the function (just like alloca) and hence it can be easily determined that it does not alias other values. If in the IR there exists a memcpy from a byval argument to a local alloca, and then from that local alloca to another alloca, MemCpyOpt will optimize out the first memcpy because byval argument's content will not change. This causes issues because after a coroutine suspension, the byval argument may die outside of the function, and latter uses will lead to memory use-after-free.
This is only a problem for arguments with either byval attribute or noalias attribute, because only these two kinds are considered local. Arguments without these two attributes will be considered to alias coro_suspend and hence we won't have this problem. So we need to be able to deal with these two attributes in coroutines properly.
For noalias arguments, since coro_suspend may potentially change the value of any argument outside of the function, we simply shouldn't mark any argument in a coroutiune as noalias. This can be taken care of in CoroEarly pass.
For byval arguments, if such an argument needs to live across suspensions, we will have to copy their value content to the frame, not just the pointer.

Differential Revision: https://reviews.llvm.org/D104184

3522167e

Jun 17, 2021
- [NFC][SimpleLoopUnswitch] unswitchTrivialBranch(): add debug output explaining unswitching failure · 84eeb828
  Roman Lebedev authored Jun 17, 2021
```
It's not prohibitively verbose, and allows easier understanding
why certain unswitching ultimately wasn't performed.
```
  84eeb828
Jun 18, 2021

[Attributor] Derive AACallEdges attribute · eaf1b681

Kuter Dinel authored Jun 10, 2021

This attribute computes the optimistic live call edges using the attributor
liveness information. This attribute will be used for deriving a
inter-procedural function reachability attribute.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104059

eaf1b681

Jun 17, 2021

Revert "[DFSan] Cleanup code for platforms other than Linux x86_64." · 39295e92
Andrew Browne authored Jun 17, 2021
```
This reverts commit 8441b993.

Buildbot failures.
```
39295e92

[InstrProfiling] Make __profd_ unconditionally private for ELF · 76d0747e

Fangrui Song authored Jun 17, 2021

For ELF, since all counters/data are in a section group (either `comdat any` or
`comdat noduplicates`), and the signature for `comdat any` is `__profc_`, the
D1003372 optimization prerequisite (linker GC cannot discard data variables
while the text section is retained) is always satisified, we can make __profd_
unconditionally private.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D103717

76d0747e

[PartiallyInlineLibCalls] Disable sqrt expansion for strictfp. · 99e95856

Craig Topper authored Jun 17, 2021

This pass emits a floating point compare and a conditional branch,
but if strictfp is enabled we don't emit a constrained compare
intrinsic.

The backend also won't expand the readonly sqrt call this pass inserts
to a sqrt instruction under strictfp. So we end up with 2 libcalls as
seen here. https://godbolt.org/z/oax5zMEWd

Fix these things by disabling the pass.

Differential Revision: https://reviews.llvm.org/D104479

99e95856

[DFSan] Cleanup code for platforms other than Linux x86_64. · 8441b993

Andrew Browne authored Jun 17, 2021

These other platforms are unsupported and untested.
They could be re-added later based on MSan code.

Reviewed By: gbalats, stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D104481

8441b993

[LoopUnroll] Fold all exits based on known trip count/multiple · f7c54c46

Nikita Popov authored Jun 13, 2021

Fold all exits based on known trip count/multiple information from
SCEV. Previously only the latch exit or the single exit were folded.

This doesn't yet eliminate ULO.TripCount and ULO.TripMultiple
entirely: They're still used to a) decide whether runtime unrolling
should be performed and b) for ORE remarks. However, the core
unrolling logic is independent of them now.

Differential Revision: https://reviews.llvm.org/D104203

f7c54c46

[NFC] LoopVectorizationCostModel::getMaximizedVFForTarget(): clarify debug msg · 37dfc467

Roman Lebedev authored Jun 17, 2021

This really isn't talking about vectors in general,
but only about either fixed or scalable vectors,
and it's pretty confusing to see it state
that there aren't any vectors :)

37dfc467

[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210) · 69b0ed9a

hyeongyukim authored Jun 17, 2021

As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210
...the bug is triggered as Eli say when sext(idx) * ElementSize overflows.

```
   // assume that GV is an array of 4-byte elements
   GEP = gep GV, 0, Idx // this is accessing Idx * 4
   L = load GEP
   ICI = icmp eq L, value
 =>
   ICI = icmp eq Idx, NewIdx
```

The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp.
And there is a problem because Idx * ElementSize can overflow.

Let's assume that the wanted value is at offset 0.
Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00.
We should return true for all these values, but currently, the new icmp only returns true for 0x00..00.

This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx.

```
   ...
 =>
   Idx' = and Idx, 0x3F..FF
   ICI = icmp eq Idx', NewIdx
```

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D99481

69b0ed9a

[FuncSpec] Don't specialise functions with attribute NoDuplicate. · dcd23d87
Sjoerd Meijer authored Jun 16, 2021
```
Differential Revision: https://reviews.llvm.org/D104378
```
dcd23d87

[VPlan] Support PHIs as LastInst when inserting scalars in ::get(). · 80a40334

Florian Hahn authored Jun 17, 2021

At the moment, we create insertelement instructions directly after
LastInst when inserting scalar values in a vector in
VPTransformState::get.

This results in invalid IR when LastInst is a phi, followed by another
phi. In that case, the new instructions should be inserted just after
the last PHI node in the block.

At the moment, I don't think the problematic case can be triggered, but
it can happen once predicate regions are merged and multiple
VPredInstPHI recipes are in the same block (D100260).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104188

80a40334

Update @llvm.powi to handle different int sizes for the exponent · 4c7f820b

Bjorn Pettersson authored Mar 26, 2021

This can be seen as a follow up to commit 0ee439b7,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439

4c7f820b

Jun 16, 2021
- [FuncSpec] Statistics · 49ab3b17
  Sjoerd Meijer authored Jun 11, 2021
```
Adds some bookkeeping for collecting the number of specialised functions and a
test for that.

Differential Revision: https://reviews.llvm.org/D104102
```
  49ab3b17
- [SLP] Incorrect handling of external scalar values · 96cded5b
  Evgeniy Brevnov authored Jun 09, 2021
```
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D103954
```
  96cded5b
- [DFSan][NFC] Fix shadowing variable name. · e652d991
  Andrew Browne authored Jun 15, 2021
  
  e652d991