Commits · 9ed8e0caab9b6f638e82979f6fdf60d67ce65b92 · Lorenzo Albano / LLVM bpEVL

Dec 17, 2020

[NFC] Reduce include files dependency and AA header cleanup (part 2). · 9ed8e0ca

dfukalov authored Dec 09, 2020

Continuing work started in https://reviews.llvm.org/D92489:

Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h".

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92852

9ed8e0ca

Make LLVM build in C++20 mode · 92310454

Barry Revzin authored Dec 17, 2020

Part of the <=> changes in C++20 make certain patterns of writing equality
operators ambiguous with themselves (sorry!).
This patch goes through and adjusts all the comparison operators such that
they should work in both C++17 and C++20 modes. It also makes two other small
C++20-specific changes (adding a constructor to a type that cases to be an
aggregate, and adding casts from u8 literals which no longer have type
const char*).

There were four categories of errors that this review fixes.
Here are canonical examples of them, ordered from most to least common:

// 1) Missing const
namespace missing_const {
    struct A {
    #ifndef FIXED
        bool operator==(A const&);
    #else
        bool operator==(A const&) const;
    #endif
    };

    bool a = A{} == A{}; // error
}

// 2) Type mismatch on CRTP
namespace crtp_mismatch {
    template <typename Derived>
    struct Base {
    #ifndef FIXED
        bool operator==(Derived const&) const;
    #else
        // in one case changed to taking Base const&
        friend bool operator==(Derived const&, Derived const&);
    #endif
    };

    struct D : Base<D> { };

    bool b = D{} == D{}; // error
}

// 3) iterator/const_iterator with only mixed comparison
namespace iter_const_iter {
    template <bool Const>
    struct iterator {
        using const_iterator = iterator<true>;

        iterator();

        template <bool B, std::enable_if_t<(Const && !B), int> = 0>
        iterator(iterator<B> const&);

    #ifndef FIXED
        bool operator==(const_iterator const&) const;
    #else
        friend bool operator==(iterator const&, iterator const&);
    #endif
    };

    bool c = iterator<false>{} == iterator<false>{} // error
          || iterator<false>{} == iterator<true>{}
          || iterator<true>{} == iterator<false>{}
          || iterator<true>{} == iterator<true>{};
}

// 4) Same-type comparison but only have mixed-type operator
namespace ambiguous_choice {
    enum Color { Red };

    struct C {
        C();
        C(Color);
        operator Color() const;
        bool operator==(Color) const;
        friend bool operator==(C, C);
    };

    bool c = C{} == C{}; // error
    bool d = C{} == Red;
}

Differential revision: https://reviews.llvm.org/D78938

92310454

[InstCombine] Preserve !annotation for newly created instructions. · eba09a2d

Florian Hahn authored Dec 17, 2020

When replacing an instruction with !annotation with a newly created
replacement, add the !annotation metadata to the replacement.

This mostly covers cases where the new instructions are created using
the ::Create helpers. Instructions created by IRBuilder will be handled
by D91444.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D93399

eba09a2d

[GCN] Remove unused function handleNewInstruction (NFC) · 4ad5b634

Kazu Hirata authored Dec 16, 2020

The function was added without a user on Dec 22, 2016 in commit
7e274e02.  It seems to be unused since
then.

4ad5b634

[CSSPGO] Consume pseudo-probe-based AutoFDO profile · ac068e01

Hongtao Yu authored Dec 16, 2020

This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`.

Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites).

**Adjustment to profile format **

A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like
```
!CFGChecksum: 12345
```
is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums.

Differential Revision: https://reviews.llvm.org/D92347

ac068e01

Disable Jump Threading for the targets with divergent control flow · 35ec3ff7

alex-t authored Dec 15, 2020

Details: Jump Threading does not make sense for the targets with divergent CF
since they do not use branch prediction for speculative execution.
Also in the high level IR there is no enough information to conclude that the branch is divergent or uniform.
This may cause errors in further CF lowering.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93302

35ec3ff7

Dec 16, 2020

[SimplifyCFG] Teach mergeEmptyReturnBlocks() to preserve DomTree · d22a47e9

Roman Lebedev authored Dec 16, 2020

A first real transformation that didn't already knew how to do that,
but it's pretty tame - either change successor of all the predecessors
of a block and carefully delay deletion of the block until afterwards
the DomTree updates are appled, or add a successor to the block.

There wasn't a great test coverage for this, so i added extra, to be sure.

d22a47e9

[SimplifyCFG] TryToSimplifyUncondBranchFromEmptyBlock() already knows how to preserve DomTree · 5cce4aff

Roman Lebedev authored Dec 17, 2020

... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.

5cce4aff

[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree · 49dac4ac

Roman Lebedev authored Dec 16, 2020

... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.

49dac4ac

[SimplifyCFG] removeUnreachableBlocks() already knows how to preserve DomTree · 4fc169f6

Roman Lebedev authored Dec 16, 2020

... so just ensure that we pass DomTreeUpdater it into it.

Apparently, there were no dedicated tests just for that functionality,
so i'm adding one here.

4fc169f6

[PGO] Use the sum of profile counts to fix the function entry count · 0abd7445

Rong Xu authored Dec 16, 2020

Raw profile count values for each BB are not kept after profile
annotation. We record function entry count and branch weights
and use them to compute the count when needed.  This mechanism
works well in a perfect world, but often breaks in real programs,
because of number prevision, inconsistent profile, or bugs in
BFI). This patch uses sum of profile count values to fix
function entry count to make the BFI count close to real profile
counts.

Differential Revision: https://reviews.llvm.org/D61540

0abd7445

[DSE] Pass MemoryLocation by const ref (NFC) · e7280248
Nikita Popov authored Dec 16, 2020

e7280248

[VectorCombine] optimize alignment for load transform · 38ebc1a1

Sanjay Patel authored Dec 16, 2020

Here's another minimal step suggested by D93229 / D93397 .
(I'm trying to be extra careful in these changes because
load transforms are easy to get wrong.)

We can optimistically choose the greater alignment of a
load and its pointer operand. As the test diffs show, this
can improve what would have been unaligned vector loads
into aligned loads.

When we enhance with gep offsets, we will need to adjust
the alignment calculation to include that offset.

Differential Revision: https://reviews.llvm.org/D93406

38ebc1a1

[VectorCombine] loosen alignment constraint for load transform · aaaf0ec7

Sanjay Patel authored Dec 16, 2020

As discussed in D93229, we only need a minimal alignment constraint
when querying whether a hypothetical vector load is safe. We still
pass/use the potentially stronger alignment attribute when checking
costs and creating the new load.

There's already a test that changes with the minimum code change,
so splitting this off as a preliminary commit independent of any
gep/offset enhancements.

Differential Revision: https://reviews.llvm.org/D93397

aaaf0ec7

[LoopNest] Handle loop-nest passes in LoopPassManager · fa3693ad

Whitney Tsang authored Dec 16, 2020

Per http://llvm.org/OpenProjects.html#llvm_loopnest, the goal of this
patch (and other following patches) is to create facilities that allow
implementing loop nest passes that run on top-level loop nests for the
New Pass Manager.

This patch extends the functionality of LoopPassManager to handle
loop-nest passes by specializing the definition of LoopPassManager that
accepts both kinds of passes in addPass.

Only loop passes are executed if L is not a top-level one, and both
kinds of passes are executed if L is top-level. Currently, loop nest
passes should have the following run method:

PreservedAnalyses run(LoopNest &, LoopAnalysisManager &,
LoopStandardAnalysisResults &, LPMUpdater &);

Reviewed By: Whitney, ychen
Differential Revision: https://reviews.llvm.org/D87045

fa3693ad

[SLPVectorizer]Migrate getEntryCost to return InstructionCost · be9184bc

Caroline Concatto authored Dec 10, 2020

This patch also changes:
  the return type of getGatherCost and
  the signature of the debug function dumpTreeCosts
to use InstructionCost.

This patch is part of a series of patches to use InstructionCost instead of
unsigned/int for the cost model functions.

See this thread for context:
http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

See this patch for the introduction of the type:
https://reviews.llvm.org/D91174

Depends on D93049

Differential Revision: https://reviews.llvm.org/D93127

be9184bc

[CostModel]Migrate getTreeCost() to use InstructionCost · 07217e0a

Caroline Concatto authored Dec 10, 2020

This patch changes the type of cost variables (for instance: Cost, ExtractCost,
SpillCost) to use InstructionCost.
This patch also changes the type of cost variables to InstructionCost in other
functions that use the result of getTreeCost()
This patch is part of a series of patches to use InstructionCost instead of
unsigned/int for the cost model functions.

See this thread for context:
http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Depends on D91174

Differential Revision: https://reviews.llvm.org/D93049

07217e0a

Revert "Ensure SplitEdge to return the new block between the two given blocks" · c1075720
Bangtian Liu authored Dec 16, 2020
```
This reverts commit cf638d79.
```
c1075720

[LV] Weaken a unnecessarily strong assert [NFC] · 1f6e1556

Philip Reames authored Dec 15, 2020

Account for the fact that (in the future) the latch might be a switch not a branch.  The existing code is correct, minus the assert.

1f6e1556

[LV] Extend dead instruction detection to multiple exiting blocks · af7ef895

Philip Reames authored Dec 15, 2020

Given we haven't yet enabled multiple exiting blocks, this is currently non functional, but it's an obvious extension which cleans up a later patch.

I don't think this is worth review (as it's pretty obvious), if anyone disagrees, feel feel to revert or comment and I will.

af7ef895

Ensure SplitEdge to return the new block between the two given blocks · cf638d79

Bangtian Liu authored Dec 15, 2020

This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.

Below is an example of splitting the block bb1 at its first instruction.

/// Original IR
bb0:
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
	br bb1
bb1:
	br bb1.split
bb1.split:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
	br bb1.split
bb1.split
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:

Differential Revision: https://reviews.llvm.org/D92200

cf638d79

Dec 15, 2020

[OpenMP] Use assumptions during ICV tracking · dcaec812

Johannes Doerfert authored Nov 24, 2020

The OpenMP 5.1 assumptions `no_openmp` and `no_openmp_routines` allow us
to ignore calls that would otherwise prevent ICV tracking.

Once we track more ICVs we might need to distinguish the ones that could
be impacted even with `no_openmp_routines`.

Reviewed By: sstefan1

Differential Revision: https://reviews.llvm.org/D92050

dcaec812

[OpenMPOpt][NFC] Clang format · d08d490a
Johannes Doerfert authored Dec 10, 2020

d08d490a

[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware · e1133179

Roman Lebedev authored Dec 15, 2020

Two observations:
1. Unavailability of DomTree makes it impossible to make
  `FoldBranchToCommonDest()` transform in certain cases,
   where the successor is dominated by predecessor,
   because we then don't have PHI's, and can't recreate them,
   well, without handrolling 'is dominated by' check,
   which doesn't really look like a great solution to me.
2. Avoiding invalidating DomTree in SimplifyCFG will
   decrease the number of `Dominator Tree Construction` by 5
   (from 28 now, i.e. -18%) in `-O3` old-pm pipeline
   (as per `llvm/test/Other/opt-O3-pipeline.ll`)
   This might or might not be beneficial for compile time.

So the plan is to make SimplifyCFG preserve DomTree, and then
eventually make DomTree fully required and preserved by the pass.

Now, SimplifyCFG is ~7KLOC. I don't think it will be nice
to do all this uplifting in a single mega-commit,
nor would it be possible to review it in any meaningful way.

But, i believe, it should be possible to do this in smaller steps,
introducing the new behavior, in an optional way, off-by-default,
opt-in option, and gradually fixing transforms one-by-one
and adding the flag to appropriate test coverage.

Then, eventually, the default should be flipped,
and eventually^2 the flag removed.

And that is what is happening here - when the new off-by-default option
is specified, DomTree is required and is claimed to be preserved,
and SimplifyCFG-internal assertions verify that the DomTree is still OK.

e1133179

[LV] Restructure handling of -prefer-predicate-over-epilogue option [NFC] · a81db8b3

Philip Reames authored Dec 15, 2020

This should be purely non-functional.  When touching this code for another reason, I found the handling of the PredicateOrDontVectorize piece here very confusing.  Let's make it an explicit state (instead of an implicit combination of two variables), and use early return for options/hint processing.

a81db8b3

SeparateConstOffsetFromGEP::lowerToSingleIndexGEPs - don't use dyn_cast_or_null. NFCI. · a3bd67f2

Simon Pilgrim authored Dec 15, 2020

ResultPtr is guaranteed to be non-null - and using dyn_cast_or_null causes unnecessary static analyzer warnings.

We can't say the same for FirstResult AFAICT, so keep dyn_cast_or_null for that.

a3bd67f2

[AnnotationRemarks] Also generate annotation remarks when using -O0. · 7ea3932a

Florian Hahn authored Dec 15, 2020

The AnnotationRemarks pass is already run at the end of the module
pipeline. This patch also adds it before bailing out for -O0, so remarks
are also generated with -O0.

7ea3932a

[VPlan] Use VPDef for VPWidenSelectRecipe. · 7186a396

Florian Hahn authored Dec 15, 2020

This patch turns updates VPWidenSelectRecipe to manage the value
it defines using VPDef.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D90560

7186a396

[InstCombine] Remove scalable vector restriction in foldVectorBinop · 52a3267f
Jun Ma authored Dec 15, 2020
```
Differential Revision: https://reviews.llvm.org/D93289
```
52a3267f
[InstCombine][NFC] Change cast of FixedVectorType to dyn_cast. · ffe84d90
Jun Ma authored Dec 14, 2020

ffe84d90
[InstCombine] Remove scalable vector restriction in InstCombineCompares · e12f5845
Jun Ma authored Dec 15, 2020
```
Differential Revision: https://reviews.llvm.org/D93269
```
e12f5845
[InstCombine] Remove scalable vector restriction when fold SelectInst · 2ac58e21
Jun Ma authored Dec 11, 2020
```
Differential Revision: https://reviews.llvm.org/D93083
```
2ac58e21

[VPlan] Use VPDef for VPWidenGEPRecipe. · 318f5798

Florian Hahn authored Dec 15, 2020

This patch turns updates VPWidenGEPRecipe to manage the value it defines
using VPDef. The VPValue is used  during VPlan construction and
codegeneration instead of the plain IR reference where possible.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D90561

318f5798

[VPlan] Use VPdef for VPWidenCall. · ad1161f9

Florian Hahn authored Dec 15, 2020

This patch turns updates VPWidenREcipe to manage the value it defines
using VPDef.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D90559

ad1161f9

Reland "[MachineDebugify] Insert synthetic DBG_VALUE instructions" · a852ee19
Nico Weber authored Dec 14, 2020
```
This reverts commit 841f9c93.
The change landed many months ago; something else broke those tests.
```
a852ee19
Revert "[MachineDebugify] Insert synthetic DBG_VALUE instructions" · 841f9c93
Nico Weber authored Dec 14, 2020
```
This reverts commit 2a5675f1.
The tests it adds fail: https://reviews.llvm.org/D78135#2453736
```
841f9c93

Revert "ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC" · d2ed9d6b

Reid Kleckner authored Dec 14, 2020

We determined that the MSVC implementation of std::aligned* isn't suited
to our needs. It doesn't support 16 byte alignment or higher, and it
doesn't really guarantee 8 byte alignment. See
https://github.com/microsoft/STL/issues/1533

Also reverts "ADT: Change AlignedCharArrayUnion to an alias of std::aligned_union_t, NFC"

Also reverts "ADT: Remove AlignedCharArrayUnion, NFC" to bring back
AlignedCharArrayUnion.

This reverts commit 4d8bf870.

This reverts commit d10f9863.

This reverts commit 4b5dc150.

d2ed9d6b

[PGO] Verify BFI counts after loading profile data · 54e03d03

Rong Xu authored Dec 14, 2020

This patch adds the functionality to compare BFI counts with real
profile
counts right after reading the profile. It will print remarks under
-Rpass-analysis=pgo, or the internal option -pass-remarks-analysis=pgo.

Differential Revision: https://reviews.llvm.org/D91813

54e03d03

Dec 14, 2020

[clang][IR] Add support for leaf attribute · 7c0e3a77

Gulfem Savrun Yeniceri authored Dec 14, 2020

This patch adds support for leaf attribute as an optimization hint
in Clang/LLVM.

Differential Revision: https://reviews.llvm.org/D90275

7c0e3a77

[VectorCombine] make load transform poison-safe · d399f870

Sanjay Patel authored Dec 14, 2020

As noted in D93229, the transform from scalar load to vector load
potentially leaks poison from the extra vector elements that are
being loaded.

We could use freeze here (and x86 codegen at least appears to be
the same either way), but we already have a shuffle in this logic
to optionally change the vector size, so let's allow that
instruction to serve both purposes.

Differential Revision: https://reviews.llvm.org/D93238

d399f870