Commits · 1d5daed1997d2fc1fbb6fd19156518bde93d1034 · Lorenzo Albano / LLVM bpEVL

Nov 10, 2020

[LoopFlatten] Run it earlier, just before IndVarSimplify · 2ef47910

Sjoerd Meijer authored Nov 09, 2020

This is a prep step for widening induction variables in LoopFlatten if this is
posssible (D90640), to avoid having to perform certain overflow checks. Since
IndVarSimplify may already widen induction variables, we want to run
LoopFlatten just before IndVarSimplify. This is a minor reshuffle as both
passes were already close after each other.

Differential Revision: https://reviews.llvm.org/D90402

2ef47910

Add loop distribution to the LTO pipeline · dd03881b

Sanne Wouda authored Oct 21, 2020

The LoopDistribute pass is missing from the LTO pipeline, so
-enable-loop-distribute has no effect during post-link. The pre-link
loop distribution doesn't seem to survive the LTO pipeline either.

With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
unaffected.

Differential Revision: https://reviews.llvm.org/D89896

dd03881b

[OMPIRBuilder] Start 'Create' methods with lower case. NFC. · e5dba2d7

Michael Kruse authored Nov 09, 2020

For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews.

This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers.

I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller.

Reviewed By: mehdi_amini, fghanim

Differential Revision: https://reviews.llvm.org/D91109

e5dba2d7

Nov 03, 2020

Revert "Add loop distribution to the LTO pipeline" · 2ec26d3a
Sanne Wouda authored Nov 03, 2020
```
This reverts commit 6e80318e.
```
2ec26d3a

Add loop distribution to the LTO pipeline · 6e80318e

Sanne Wouda authored Oct 21, 2020

The LoopDistribute pass is missing from the LTO pipeline, so
-enable-loop-distribute has no effect during post-link. The pre-link
loop distribution doesn't seem to survive the LTO pipeline either.

With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
unaffected.

Differential Revision: https://reviews.llvm.org/D89896

6e80318e

Nov 02, 2020

[PartialInliner]: Handle code regions in a switch stmt cases · 4274cbba

Ettore Tiotto authored Nov 02, 2020

This patch enhances computeOutliningColdRegionsInfo() to allow it to
consider regions containing a single basic block and a single
predecessor as candidate for partial inlining.

Reviewed By: fhann

Differential Revision: https://reviews.llvm.org/D89911

4274cbba

Oct 31, 2020
- Revert "Use uint64_t for branch weights instead of uint32_t" · 5c31b8b9
  Arthur Eubanks authored Oct 31, 2020
```
This reverts commit 10f2a0d6.

More uint64_t overflows.
```
  5c31b8b9
Oct 30, 2020

Use uint64_t for branch weights instead of uint32_t · 10f2a0d6

Arthur Eubanks authored Sep 30, 2020

CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609

10f2a0d6

Oct 29, 2020

[Attributor][FIX] Properly promote arguments pointers to arrays · d39f574d

Johannes Doerfert authored Oct 29, 2020

When we promote pointer arguments we did compute a wrong offset and use
a wrong type for the array case.

Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.

d39f574d

Oct 28, 2020
- Revert "[OpenMP] Add Passing in Original Declaration Names To Mapper API" · 207cf71f
  Benjamin Kramer authored Oct 28, 2020
```
This reverts commit d981c7b7 and
a87d7b3d. Test fails under msan.
```
  207cf71f
- [Attributor] Finalize the CGUpdater after each SCC · d13daa40
  Johannes Doerfert authored Oct 10, 2020
```
This matches the new PM model.
```
  d13daa40
- [Attributor][NFC] Introduce a debug counter for `AA::manifest` · 50d34958
  Johannes Doerfert authored Oct 10, 2020
```
This will simplify debugging and tracking down problems.
```
  50d34958
- [Attributor][NFC] Print the right value in debug output · 1d57b7f5
  Johannes Doerfert authored Oct 10, 2020
  
  1d57b7f5
- [Attributor][FIX] Delete all unreachable static functions · 1c2531c9
  Johannes Doerfert authored Sep 10, 2020
```
Before we used to only mark unreachable static functions as dead if all
uses were known dead. Now we optimistically assume uses to be dead until
proven otherwise.
```
  1c2531c9
- [Attributor][FIX] Do not attach range metadata to the wrong Instruction · bfe05b1a
  Johannes Doerfert authored Oct 10, 2020
```
If we are looking at a call site argument it might be a load or call
which is in a different context than the call site argument. We cannot
simply use the call site argument range for the call or load.

Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.
```
  bfe05b1a
- [Attributor][NFC] Clang-format · 724fcce1
  Johannes Doerfert authored Oct 10, 2020
  
  724fcce1
- [Attributor][NFC] Hoist call out of a lambda · d504f7b9
  Johannes Doerfert authored Oct 10, 2020
```
The call is not free, unsure if  this is needed but it does not make it
worse either.
```
  d504f7b9
- [Attributor][FIX] Properly check uses in the call not uses of the call · 30e5a1f0
  Johannes Doerfert authored Oct 10, 2020
```
In the AANoAlias logic we determine if a pointer may have been captured
before a call. We need to look at other uses in the call not uses of the
call.

The new code is not perfect as it does not allow trivial cases where the
call has multiple arguments but it is at least not unsound and a TODO
was added.
```
  30e5a1f0
- [Attributor][NFC] Improve time trace output · cb813ab6
  Johannes Doerfert authored Oct 10, 2020
  
  cb813ab6
Oct 27, 2020

[OpenMP] Add Passing in Original Declaration Names To Mapper API · a87d7b3d

Joseph Huber authored Oct 22, 2020

Summary:
This patch adds support for passing in the original delcaration name in the
source file to the libomptarget runtime. This will allow the runtime to provide
more intelligent debugging messages. This patch takes the original expression
parsed from the OpenMP map / update clause and provides a textual
representation if it was explicitly mapped, otherwise it takes the name of the
variable declaration as a fallback. The information in passed to the runtime in
a global array of strings that matches the existing ident_t source location
strings using ";name;filename;column;row;;". See
clang/test/OpenMP/target_map_names.cpp for an example of the generated output
for a given map clause.

Reviewers: jdoervert

Differential Revision: https://reviews.llvm.org/D89802

a87d7b3d

Revert "Use uint64_t for branch weights instead of uint32_t" · 2a4e704c

Nico Weber authored Oct 27, 2020

This reverts commit e5766f25.
Makes clang assert when building Chromium, see https://crbug.com/1142813
for a repro.

2a4e704c

Reland [AlwaysInliner] Pass callee AAResults to InlineFunction() · 42f76e19

Arthur Eubanks authored Oct 16, 2020

Test copied from noalias-calls.ll with small changes.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89609

42f76e19

Use uint64_t for branch weights instead of uint32_t · e5766f25

Arthur Eubanks authored Sep 30, 2020

CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609

e5766f25

Revert "[AlwaysInliner] Pass callee AAResults to InlineFunction()" · 4af5ba17
Arthur Eubanks authored Oct 26, 2020
```
This reverts commit 504fbec7.

Test failure.
```
4af5ba17

[AlwaysInliner] Pass callee AAResults to InlineFunction() · 504fbec7

Arthur Eubanks authored Oct 16, 2020

Test copied from noalias-calls.ll with small changes.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89609

504fbec7

Oct 24, 2020

[AutoFDO] Remove a broken assert in merging inlinee samples · a16cbdd6

Hongtao Yu authored Oct 22, 2020

Duplicated callsites share the same callee profile if the original callsite was inlined. The sharing also causes the profile of callee's callee to be shared. This breaks the assert introduced ealier by D84997 in a tricky way.

To illustrate, I'm using an abstract example. Say we have three functions `A`, `B` and `C`. A calls B twice and B calls C once. Some optimize performed prior to the sample profile loader duplicates first callsite to `B` and the program may look like

```
A()
{
B(); // with nested profile B1 and C1
B(); // duplicated, with nested profile B1 and C1
B(); // with nested profile B2 and C2
}
```

For some reason, the sample profile loader inliner then decides to only inline the first callsite in `A` and transforms `A` into

```
A()
{
C(); // with nested profile C1
B(); // duplicated, with nested profile B1 and C1
B(); // with nested profile B2 and C2.
}
```

Here is what happens next:

1. Failing to inline the callsite `C()` results in `C1`'s samples returned to `C`'s base (outlined) profile. In the meantime, `C1`'s head samples are updated to `C1`'s entry sample. This also affects the profile of the middle callsite which shares `C1` with the first callsite.
2. Failing to inline the middle callsite results in `B1` returned to `B`'s base profile, which in turn will cause `C1` merged into `B`'s base profile. Note that the nest `C` profile in `B`'s base has a non-zero head sample count now. The value actually equals to `C1`'s entry count.
3. Failing to inline last callsite results in `B2` returned to `B`'s base profile. Note that the nested `C` profile in `B`'s base now has an entry count equal to the sum of that of `C1` and `C2`, with the head count equal to that of `C1`. This will trigger the assert later on.
4. Compiling `B` using `B`'s base profile. Failing to inline `C` there triggers the returning of the nested `C` profile. Since the nested `C` profile has a non-zero head count, the returning doesn't go through. Instead, the assert goes off.

It's good that `C1` is only returned once, based on using a non-zero head count to ensure an inline profile is only returned once. However C2 is never returned. While it seems hard to solve this perfectly within the current framework, I'm just removing the broken assert. This should be reasonably fixed by the upcoming CSSPGO work where counts returning is based on context-sensitivity and a distribution factor for callsite probes.

The simple example is extracted from one of our internal services. In reality, why the original callsite `B()` and duplicate one having different inline behavior is a magic. It has to do with imperfect counts in profile and extra complicated inlining that makes the hotness for them different.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D90056

a16cbdd6

[Inliner][NPM] Properly pass callee AAResults · ba22c403
Arthur Eubanks authored Oct 16, 2020
```
Fixes noalias-calls.ll under NPM.

Differential Revision: https://reviews.llvm.org/D89592
```
ba22c403

Oct 23, 2020

[IR] add fn attr for no_stack_protector; prevent inlining on mismatch · b7926ce6

Nick Desaulniers authored Oct 23, 2020

It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.

It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.

While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/
https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u

Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining. Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.

Fixes pr/47479.

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D87956

b7926ce6

[SVE]Clarify TypeSize comparisons in llvm/lib/Transforms · 24156364

Caroline Concatto authored Oct 16, 2020

Use isKnownXY comparators when one of the operands can be with
scalable vectors or getFixedSize() for all the other cases.

This patch also does bug fixes for getPrimitiveSizeInBits by using
getFixedSize() near the places with the TypeSize comparison.

Differential Revision: https://reviews.llvm.org/D89703

24156364

[Inliner] Run always-inliner in inliner-wrapper · 0291e2c9

Arthur Eubanks authored Sep 01, 2020

An alwaysinline function may not get inlined in inliner-wrapper due to
the inlining order.

Previously for the following, the inliner would first inline @a() into @b(),

```
define void @a() {
entry:
  call void @b()
  ret void
}

define void @b() alwaysinline {
entry:
  br label %for.cond

for.cond:
  call void @a()
  br label %for.cond
}
```

making @b() recursive and unable to be inlined into @a(), ending at

```
define void @a() {
entry:
  call void @b()
  ret void
}

define void @b() alwaysinline {
entry:
  br label %for.cond

for.cond:
  call void @b()
  br label %for.cond
}
```

Running always-inliner first makes sure that we respect alwaysinline in more cases.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46945.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D86988

0291e2c9

Oct 22, 2020

[NFC][PartialInliner]: Clean up code · e6521ce0

Ettore Tiotto authored Oct 22, 2020

Make member function const where possible, use LLVM_DEBUG to print debug traces
rather than a custom option, pass by reference to avoid null checking, ...

Reviewed By: fhann

Differential Revision: https://reviews.llvm.org/D89895

e6521ce0

Oct 21, 2020

[BlockExtract][NewPM] Port -extract-blocks to NPM · 8d9466a3
Arthur Eubanks authored Oct 07, 2020
```
Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89015
```
8d9466a3

[Passes] Move ADCE before DSE & LICM. · 88241ffb

Florian Hahn authored Oct 21, 2020

The adjustment seems to have very little impact on optimizations.
The only binary change with -O3 MultiSource/SPEC2000/SPEC2006 on X86 is
in consumer-typeset and the size there actually decreases by -0.1%, with
not significant changes in the stats.

On its own, it is mildly positive in terms of compile-time, most likely
due to LICM & DSE having to process slightly less instructions. It
should also be unlikely that DSE/LICM make much new code dead.

http://llvm-compile-time-tracker.com/compare.php?from=df63eedef64d715ce1f31843f7de9c11fe1e597f&to=e3bdfcf94a9eeae6e006d010464f0c1b3550577d&stat=instructions

With DSE & MemorySSA, it gives some nice compile-time improvements, due
to the fact that DSE can re-use the PDT from ADCE, if it does not make
any changes:

http://llvm-compile-time-tracker.com/compare.php?from=15fdd6cd7c24c745df1bb419e72ff66fd138aa7e&to=481f494515fc89cb7caea8d862e40f2c910dc994&stat=instructions

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D87322

88241ffb

Oct 19, 2020

Revert "[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting" · 0628bea5

Hans Wennborg authored Oct 19, 2020

This broke Chromium's PGO build, it seems because hot-cold-splitting got turned
on unintentionally. See comment on the code review for repro etc.

> This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
> the splitting pass to be toggled on/off. The current method of passing
> `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
> correctly (say, with `-O0` or `-Oz`).
>
> To implement the -fsplit-cold-code option, an attribute is applied to
> functions to indicate that they may be considered for splitting. This
> removes some complexity from the old/new PM pipeline builders, and
> behaves as expected when LTO is enabled.
>
> Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
> Differential Revision: https://reviews.llvm.org/D57265
> Reviewed By: Aditya Kumar, Vedant Kumar
> Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar

This reverts commit 273c299d.

0628bea5

Oct 16, 2020

[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting · 273c299d

Vedant Kumar authored Oct 13, 2020

This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
the splitting pass to be toggled on/off. The current method of passing
`-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
correctly (say, with `-O0` or `-Oz`).

To implement the -fsplit-cold-code option, an attribute is applied to
functions to indicate that they may be considered for splitting. This
removes some complexity from the old/new PM pipeline builders, and
behaves as expected when LTO is enabled.

Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
Differential Revision: https://reviews.llvm.org/D57265
Reviewed By: Aditya Kumar, Vedant Kumar
Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar

273c299d

Oct 14, 2020

[ValueTracking] Use assume's noundef operand bundle · 9b3c2a72

Juneyoung Lee authored Oct 12, 2020

This patch updates `isGuaranteedNotToBeUndefOrPoison` to use `llvm.assume`'s `noundef` operand bundle.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89219

9b3c2a72

[Attributor][NFC] Make `createShallowWrapper()` available outside of Attributor · ce16be25

sstefan1 authored Oct 13, 2020

D85703 will need to create shallow wrappers in order to track the spmd icv. We need to make it available.

Differential Revision: https://reviews.llvm.org/D89342

ce16be25

[LoopExtract][NewPM] Port -loop-extract to NPM · 518ec05a

Arthur Eubanks authored Oct 07, 2020

-loop-extract-single is just -loop-extract on one loop.

-loop-extract depended on -break-crit-edges and -loop-simplify in the
legacy PM, but the NPM doesn't allow specifying pass dependencies like
that, so manually add those passes to the RUN lines where necessary.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89016

518ec05a

Oct 09, 2020

[OpenMPOpt] Merge parallel regions · 3a6bfcf2

Giorgis Georgakoudis authored Jul 07, 2020

There are cases that generated OpenMP code consists of multiple,
consecutive OpenMP parallel regions, either due to high-level
programming models, such as RAJA, Kokkos, lowering to OpenMP code, or
simply because the programmer parallelized code this way.  This
optimization merges consecutive parallel OpenMP regions to: (1) reduce
the runtime overhead of re-activating a team of threads; (2) enlarge the
scope for other OpenMP optimizations, e.g., runtime call deduplication
and synchronization elimination.

This implementation defensively merges parallel regions, only when they
are within the same BB and any in-between instructions are safe to
execute in parallel.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83635

3a6bfcf2

Oct 07, 2020
- [Attributor] Use smarter way to determine alignment of GEPs · 7993d611
  Johannes Doerfert authored Sep 12, 2020
```
Use same logic existing in other places to deal with base case GEPs.

Add the original Attributor talk example.
```
  7993d611