Commits · 1d6ebdfb66b9d63d34f34ec6ac7ec57eff7cd24b · Lorenzo Albano / LLVM bpEVL

Dec 02, 2020

[CSSPGO] Pseudo probes for function calls. · 24d4291c

Hongtao Yu authored Dec 01, 2020

An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work.

One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution.

With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst.

To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use.

Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`.

Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91756

24d4291c

[OpenMPIRBuilder] forward arguments as pointers to outlined function · 240dd924

Alex Zinenko authored Nov 26, 2020

OpenMPIRBuilder::createParallel outlines the body region of the parallel
construct into a new function that accepts any value previously defined outside
the region as a function argument. This function is called back by OpenMP
runtime function __kmpc_fork_call, which expects trailing arguments to be
pointers. If the region uses a value that is not of a pointer type, e.g. a
struct, the produced code would be invalid. In such cases, make createParallel
emit IR that stores the value on stack and pass the pointer to the outlined
function instead. The outlined function then loads the value back and uses as
normal.

Reviewed By: jdoerfert, llitchev

Differential Revision: https://reviews.llvm.org/D92189

240dd924

Nov 30, 2020

[llvm][inliner] Reuse the inliner pass to implement 'always inliner' · 5fe10263

Mircea Trofin authored Nov 16, 2020

Enable performing mandatory inlinings upfront, by reusing the same logic
as the full inliner, instead of the AlwaysInliner. This has the
following benefits:
- reduce code duplication - one inliner codebase
- open the opportunity to help the full inliner by performing additional
function passes after the mandatory inlinings, but before th full
inliner. Performing the mandatory inlinings first simplifies the problem
the full inliner needs to solve: less call sites, more contextualization, and,
depending on the additional function optimization passes run between the
2 inliners, higher accuracy of cost models / decision policies.

Note that this patch does not yet enable much in terms of post-always
inline function optimization.

Differential Revision: https://reviews.llvm.org/D91567

5fe10263

[CSSPGO] Pseudo probe instrumentation pass · 64fa8cce

Hongtao Yu authored Aug 24, 2020

This change introduces a pseudo probe instrumentation pass for block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

Given the following LLVM IR:

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each llvm.pseudoprobe intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86499

64fa8cce

Nov 28, 2020

Revert "[IRSim][IROutliner] Adding the extraction basics for the IROutliner." · a8a43b63

Andrew Litteken authored Nov 27, 2020

Reverting commit due to address sanitizer errors.

> Extracting the similar regions is the first step in the IROutliner.
> 
> Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
> sort them by how many instructions will be removed.  Each
> IRSimilarityCandidate is used to define an OutlinableRegion.  Each
> region is ordered by their occurrence in the Module and the regions that
> are not compatible with previously outlined regions are discarded.
> 
> Each region is then extracted with the CodeExtractor into its own
> function.
> 
> We test that correctly extract in:
> test/Transforms/IROutliner/extraction.ll
> test/Transforms/IROutliner/address-taken.ll
> test/Transforms/IROutliner/outlining-same-globals.ll
> test/Transforms/IROutliner/outlining-same-constants.ll
> test/Transforms/IROutliner/outlining-different-structure.ll
> 
> Reviewers: paquette, jroelofs, yroux
> 
> Differential Revision: https://reviews.llvm.org/D86975

This reverts commit bf899e89.

a8a43b63

[IRSim][IROutliner] Adding the extraction basics for the IROutliner. · bf899e89

Andrew Litteken authored Sep 15, 2020

Extracting the similar regions is the first step in the IROutliner.

Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
sort them by how many instructions will be removed.  Each
IRSimilarityCandidate is used to define an OutlinableRegion.  Each
region is ordered by their occurrence in the Module and the regions that
are not compatible with previously outlined regions are discarded.

Each region is then extracted with the CodeExtractor into its own
function.

We test that correctly extract in:
test/Transforms/IROutliner/extraction.ll
test/Transforms/IROutliner/address-taken.ll
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Reviewers: paquette, jroelofs, yroux

Differential Revision: https://reviews.llvm.org/D86975

bf899e89

Nov 26, 2020

[AA] Split up LocationSize::unknown() · 4df8efce

Nikita Popov authored Nov 17, 2020

Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649

4df8efce

Nov 25, 2020

[PassManager] Run Induction Variable Simplification pass *after* Recognize... · a8d74517

Roman Lebedev authored Nov 25, 2020

[PassManager] Run Induction Variable Simplification pass *after* Recognize loop idioms pass, not before

Currently, `-indvars` runs first, and then immediately after `-loop-idiom` does.
I'm not really sure if `-loop-idiom` requires `-indvars` to run beforehand,
but i'm *very* sure that `-indvars` requires `-loop-idiom` to run afterwards,
as it can be seen in the phase-ordering test.

LoopIdiom runs on two types of loops: countable ones, and uncountable ones.
For uncountable ones, IndVars obviously didn't make any change to them,
since they are uncountable, so for them the order should be irrelevant.
For countable ones, well, they should have been countable before IndVars
for IndVars to make any change to them, and since SCEV is used on them,
it shouldn't matter if IndVars have already canonicalized them.
So i don't really see why we'd want the current ordering.

Should this cause issues, it will give us a reproducer test case
that shows flaws in this logic, and we then could adjust accordingly.

While this is quite likely beneficial in-the-wild already,
it's a required part for the full motivational pattern
behind `left-shift-until-bittest` loop idiom (D91038).

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91800

a8d74517

Nov 24, 2020

[ThinLTO/WPD] Enable -wholeprogramdevirt-skip in ThinLTO backends · 6e4c1cf2

Teresa Johnson authored Nov 19, 2020

Previously this option could be used to skip devirtualizations of the
given functions in regular LTO and in the ThinLTO indexing step. This
change allows them to be skipped in the backend as well, which is useful
when debugging WPD in a distributed ThinLTO backend.

Differential Revision: https://reviews.llvm.org/D91812

6e4c1cf2

[FunctionAttrs][NPM] Fix handling of convergent · 932e4f88

Arthur Eubanks authored Oct 20, 2020

The legacy pass didn't properly detect indirect calls.

We can still remove the convergent attribute when there are indirect
calls. The LangRef says:

> When it appears on a call/invoke, the convergent attribute indicates
that we should treat the call as though we’re calling a convergent
function. This is particularly useful on indirect calls; without this we
may treat such calls as though the target is non-convergent.

So don't skip handling of convergent when there are unknown calls.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89826

932e4f88

Nov 23, 2020

[NPM] Share pass building options with legacy PM · 3c811ce4

Arthur Eubanks authored Nov 18, 2020

We should share options when possible.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91741

3c811ce4

Nov 20, 2020

[PGO] Make -disable-preinline work with NPM · b7743604

Arthur Eubanks authored Nov 19, 2020

Fixes cspgo_profile_summary.ll under NPM.

Reviewed By: xur

Differential Revision: https://reviews.llvm.org/D91826

b7743604

Nov 19, 2020

[OpenMP] Add Location Fields to Libomptarget Runtime for Debugging · da8bec47

Joseph Huber authored Nov 19, 2020

Summary:
Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values.

Reviewers: jdoerfert grokos

Differential Revision: https://reviews.llvm.org/D87946

da8bec47

Nov 18, 2020

[OpenMP] Add Passing in Original Declaration Names To Mapper API · 97e55cfe

Joseph Huber authored Nov 13, 2020

Summary:
This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;"

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D89802

97e55cfe

Revert "[IR] add fn attr for no_stack_protector; prevent inlining on mismatch" · f4c6080a
Nick Desaulniers authored Nov 17, 2020
```
This reverts commit b7926ce6.

Going with a simpler approach.
```
f4c6080a

Nov 16, 2020

Add pass to add !annotate metadata from @llvm.global.annotations. · 8dbe44cb

Florian Hahn authored Nov 16, 2020

This patch adds a new pass to add !annotation metadata for entries in
@llvm.global.anotations, which is generated  using
__attribute__((annotate("_name"))) on functions in Clang.

This has been discussed on llvm-dev as part of
    RFC: Combining Annotation Metadata and Remarks
    http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D91195

8dbe44cb

Nov 14, 2020

Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM" · 6861d938

Roman Lebedev authored Nov 14, 2020

See discussion in https://bugs.llvm.org/show_bug.cgi?id=45073 / https://reviews.llvm.org/D66324#2334485
the implementation is known-broken for certain inputs,
the bugreport was up for a significant amount of timer,
and there has been no activity to address it.
Therefore, just completely rip out all of misexpect handling.

I suspect, fixing it requires redesigning the internals of MD_misexpect.
Should anyone commit to fixing the implementation problem,
starting from clean slate may be better anyways.

This reverts commit 7bdad084,
and some of it's follow-ups, that don't stand on their own.

6861d938

Nov 13, 2020

[AlwaysInliner] Call mergeAttributesForInlining after inlining · a20220d2

Guozhi Wei authored Nov 13, 2020

Like inlineCallIfPossible and InlinerPass, after inlining mergeAttributesForInlining
should be called to merge callee's attributes to caller. But it is not called in
AlwaysInliner, causes caller's attributes inconsistent with inlined code.

Attached test case demonstrates that attribute "min-legal-vector-width"="512" is
not merged into caller without this patch, and it causes failure in SelectionDAG
when lowering the inlined AVX512 intrinsic.

Differential Revision: https://reviews.llvm.org/D91446

a20220d2

llvmbuildectomy - compatibility with ocaml bindings · 95537f45

serge-sans-paille authored Nov 13, 2020

Use exact component name in add_ocaml_library.
Make expand_topologically compatible with new architecture.
Fix quoting in is_llvm_target_library.
Fix LLVMipo component name.
Write release note.

95537f45

Add !annotation metadata and remarks pass. · 8bb63479

Florian Hahn authored Nov 13, 2020

This patch adds a new !annotation metadata kind which can be used to
attach annotation strings to instructions.

It also adds a new pass that emits summary remarks per function with the
counts for each annotation kind.

The intended uses cases for this new metadata is annotating
'interesting' instructions and the remarks should provide additional
insight into transformations applied to a program.

To motivate this, consider these specific questions we would like to get answered:

* How many stores added for automatic variable initialization remain after optimizations? Where are they?
* How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated?

Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)

Reviewed By: thegameg, jdoerfert

Differential Revision: https://reviews.llvm.org/D91188

8bb63479

llvmbuildectomy - replace llvm-build by plain cmake · 9218ff50

serge-sans-paille authored Oct 09, 2020

No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848

9218ff50

[NFC] Removed unused variable · b9406121
Arthur Eubanks authored Nov 12, 2020
```
Obsolete as of https://reviews.llvm.org/D91046.
```
b9406121

Nov 11, 2020

[CGSCC][Inliner] Handle new non-trivial edges in updateCGAndAnalysisManagerForPass · d9cbceb0

Arthur Eubanks authored Nov 06, 2020

Previously the inliner did a bit of a hack by adding ref edges for all
new edges introduced by performing an inline before calling
updateCGAndAnalysisManagerForPass(). This was because
updateCGAndAnalysisManagerForPass() didn't handle new non-trivial call
edges.

This adds handling of non-trivial call edges to
updateCGAndAnalysisManagerForPass().  The inliner called
updateCGAndAnalysisManagerForFunctionPass() since it was handling adding
newly introduced edges (so updateCGAndAnalysisManagerForPass() would
only have to handle promotion), but now it needs to call
updateCGAndAnalysisManagerForCGSCCPass() since
updateCGAndAnalysisManagerForPass() is now handling the new call edges
and function passes cannot add new edges.

We follow the previous path of adding trivial ref edges then letting promotion
handle changing the ref edges to call edges and the CGSCC updates. So
this still does not allow adding call edges that result in an addition
of a non-trivial ref edge.

This is in preparation for better detecting devirtualization. Previously
since the inliner itself would add ref edges,
updateCGAndAnalysisManagerForPass() would think that promotion and thus
devirtualization had happened after any sort of inlining.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91046

d9cbceb0

Nov 10, 2020

[LoopFlatten] Run it earlier, just before IndVarSimplify · 2ef47910

Sjoerd Meijer authored Nov 09, 2020

This is a prep step for widening induction variables in LoopFlatten if this is
posssible (D90640), to avoid having to perform certain overflow checks. Since
IndVarSimplify may already widen induction variables, we want to run
LoopFlatten just before IndVarSimplify. This is a minor reshuffle as both
passes were already close after each other.

Differential Revision: https://reviews.llvm.org/D90402

2ef47910

Add loop distribution to the LTO pipeline · dd03881b

Sanne Wouda authored Oct 21, 2020

The LoopDistribute pass is missing from the LTO pipeline, so
-enable-loop-distribute has no effect during post-link. The pre-link
loop distribution doesn't seem to survive the LTO pipeline either.

With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
unaffected.

Differential Revision: https://reviews.llvm.org/D89896

dd03881b

[OMPIRBuilder] Start 'Create' methods with lower case. NFC. · e5dba2d7

Michael Kruse authored Nov 09, 2020

For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews.

This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers.

I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller.

Reviewed By: mehdi_amini, fghanim

Differential Revision: https://reviews.llvm.org/D91109

e5dba2d7

Nov 03, 2020

Revert "Add loop distribution to the LTO pipeline" · 2ec26d3a
Sanne Wouda authored Nov 03, 2020
```
This reverts commit 6e80318e.
```
2ec26d3a

Add loop distribution to the LTO pipeline · 6e80318e

Sanne Wouda authored Oct 21, 2020

The LoopDistribute pass is missing from the LTO pipeline, so
-enable-loop-distribute has no effect during post-link. The pre-link
loop distribution doesn't seem to survive the LTO pipeline either.

With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
unaffected.

Differential Revision: https://reviews.llvm.org/D89896

6e80318e

Nov 02, 2020

[PartialInliner]: Handle code regions in a switch stmt cases · 4274cbba

Ettore Tiotto authored Nov 02, 2020

This patch enhances computeOutliningColdRegionsInfo() to allow it to
consider regions containing a single basic block and a single
predecessor as candidate for partial inlining.

Reviewed By: fhann

Differential Revision: https://reviews.llvm.org/D89911

4274cbba

Oct 31, 2020
- Revert "Use uint64_t for branch weights instead of uint32_t" · 5c31b8b9
  Arthur Eubanks authored Oct 31, 2020
```
This reverts commit 10f2a0d6.

More uint64_t overflows.
```
  5c31b8b9
Oct 30, 2020

Use uint64_t for branch weights instead of uint32_t · 10f2a0d6

Arthur Eubanks authored Sep 30, 2020

CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609

10f2a0d6

Oct 29, 2020

[Attributor][FIX] Properly promote arguments pointers to arrays · d39f574d

Johannes Doerfert authored Oct 29, 2020

When we promote pointer arguments we did compute a wrong offset and use
a wrong type for the array case.

Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.

d39f574d

Oct 28, 2020
- Revert "[OpenMP] Add Passing in Original Declaration Names To Mapper API" · 207cf71f
  Benjamin Kramer authored Oct 28, 2020
```
This reverts commit d981c7b7 and
a87d7b3d. Test fails under msan.
```
  207cf71f
- [Attributor] Finalize the CGUpdater after each SCC · d13daa40
  Johannes Doerfert authored Oct 10, 2020
```
This matches the new PM model.
```
  d13daa40
- [Attributor][NFC] Introduce a debug counter for `AA::manifest` · 50d34958
  Johannes Doerfert authored Oct 10, 2020
```
This will simplify debugging and tracking down problems.
```
  50d34958
- [Attributor][NFC] Print the right value in debug output · 1d57b7f5
  Johannes Doerfert authored Oct 10, 2020
  
  1d57b7f5
- [Attributor][FIX] Delete all unreachable static functions · 1c2531c9
  Johannes Doerfert authored Sep 10, 2020
```
Before we used to only mark unreachable static functions as dead if all
uses were known dead. Now we optimistically assume uses to be dead until
proven otherwise.
```
  1c2531c9
- [Attributor][FIX] Do not attach range metadata to the wrong Instruction · bfe05b1a
  Johannes Doerfert authored Oct 10, 2020
```
If we are looking at a call site argument it might be a load or call
which is in a different context than the call site argument. We cannot
simply use the call site argument range for the call or load.

Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.
```
  bfe05b1a
- [Attributor][NFC] Clang-format · 724fcce1
  Johannes Doerfert authored Oct 10, 2020
  
  724fcce1
- [Attributor][NFC] Hoist call out of a lambda · d504f7b9
  Johannes Doerfert authored Oct 10, 2020
```
The call is not free, unsure if  this is needed but it does not make it
worse either.
```
  d504f7b9