Commits · 1112b7bad851c78af2159d39c7bff670a9c77da1 · Lorenzo Albano / LLVM bpEVL

Oct 12, 2021

Revert "[AMDGPU] Enable load clustering in the post-RA scheduler" · 66ce1015
Jay Foad authored Oct 12, 2021
```
This reverts commit 66e13c7f.

It was committed by accident.
```
66ce1015

[TwoAddressInstruction] Remove ad hoc machine verification · f7ee21aa

Jay Foad authored Oct 12, 2021

With the -early-live-intervals command line flag,
TwoAddressInstructionPass::runOnMachineFunction would call
MachineFunction::verify before returning to check the live intervals.
But there was not much benefit to doing this since -verify-machineinstrs
and LLVM_ENABLE_EXPENSIVE_CHECKS provide a more general way of
scheduling machine verification after every pass.

Also it caused problems on targets like Lanai which are marked as "not
machine verifier clean", since verification would fail for known
target-specific problems which are nothing to do with LiveIntervals.

Differential Revision: https://reviews.llvm.org/D111618

f7ee21aa

[AMDGPU] Enable load clustering in the post-RA scheduler · 66e13c7f

Jay Foad authored Oct 12, 2021

This has a couple of benefits:
1. It can sometimes fix clusters that got broken apart when the register
   allocator inserted a copy.
2. Post-RA scheduling does not have to worry about increasing register
   pressure, which in some cases gives it more freedom to reorder
   instructions.

Testing on a collection of 10,000 graphics shaders compiled for gfx1010
showed:
- The average length of each run of one or more load instructions
  increased by about 1%.
- The number of runs of two or more load instructions increased by
  about 4%.

66e13c7f

[DebugInfo][NFC] Move LiveDebugValues class to header · 838b4a53

Jeremy Morse authored Oct 12, 2021

This patch shifts the InstrRefBasedLDV class declaration to a header.
Partially because it's already massive, but mostly so that I can start
writing some unit tests for it. This patch also adds the boilerplate for
said unit tests.

Differential Revision: https://reviews.llvm.org/D110165

838b4a53

[AArch64][SVE] Add fixed type lowering for EXTRACT_SUBVECTOR · 2eb42e3d
Bradley Smith authored Oct 05, 2021
```
Depends on D111135

Differential Revision: https://reviews.llvm.org/D111165
```
2eb42e3d
[X86] Fix implicit MathsExtras.h header dependency · 61d124f7
Simon Pilgrim authored Oct 12, 2021

61d124f7

[LoopVectorize] Classify pointer induction updates as scalar only if they have one use · 1439ef1a

Kerry McLaughlin authored Oct 11, 2021

collectLoopScalars collects pointer induction updates in ScalarPtrs, assuming
that the instruction will be scalar after vectorization. This may crash later
in VPReplicateRecipe::execute() if there there is another user of the instruction
other than the Phi node which needs to be widened.

This changes collectLoopScalars so that if there are any other users of
Update other than a Phi node, it is not added to ScalarPtrs.

Reviewed By: david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D111294

1439ef1a

[LoopPeel] Use any_of & contains instead of for & find. · 40d85f16
Florian Hahn authored Oct 12, 2021
```
Using contains was suggested in D108114, but I forgot to include it when
landing the patch.
```
40d85f16

[FuncSpec] Allow ConstExprs that are function pointers · fc0fa851

Sjoerd Meijer authored Oct 12, 2021

This is a follow up of D110529 that disallowed constexprs. That change
introduced a regression as this also disallowed constexprs that are function
pointers, which is actually one of the motivating use cases that we do want to
support.

Differential Revision: https://reviews.llvm.org/D111567

fc0fa851

[LoopPeel] Peel if it turns invariant loads dereferenceable. · cd0ba9dc

Florian Hahn authored Oct 12, 2021

This patch adds a new cost heuristic that allows peeling a single
iteration off read-only loops, if the loop contains a load that

    1. is feeding an exit condition,
    2. dominates the latch,
    3. is not already known to be dereferenceable,
    4. and has a loop invariant address.

If all non-latch exits are terminated with unreachable, such loads
in the loop are guaranteed to be dereferenceable after peeling,
enabling hoisting/CSE'ing them.

This enables vectorization of loops with certain runtime-checks, like
multiple calls to `std::vector::at` if the vector is passed as pointer.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D108114

cd0ba9dc

[RISCV] Rename assembler mnemonic of unordered floating-point reductions for v1.0-rc change · 0608bbd4

jacquesguan authored Oct 12, 2021

Rename vfredsum and vfwredsum to vfredusum and vfwredusum. Add aliases for vfredsum and vfwredsum.

Reviewed By: luismarques, HsiangKai, khchen, frasercrmck, kito-cheng, craig.topper

Differential Revision: https://reviews.llvm.org/D105690

0608bbd4

[ORC] More attempts to work around compiler failures. · 5829ba7a

Lang Hames authored Oct 11, 2021

Commit 731f991c seems to have helped, but did not catch all instances (see
https://lab.llvm.org/buildbot/#/builders/193/builds/104). Switch more inner
structs to C++98 initializers to work around the issue. Add FIXMEs to revisit
in the future.

5829ba7a

[ORC] Add more explicit narrowing casts. · 3a52a639

Lang Hames authored Oct 11, 2021

This should fix the buildbot failure at
https://lab.llvm.org/buildbot/#/builders/187/builds/2140

3a52a639

[ORC] Fix a typo in a variable name. · 9ca50641
Lang Hames authored Oct 11, 2021

9ca50641

Re-apply , "Major JITLinkMemoryManager refactor". with fixes. · 962a2479

Lang Hames authored Oct 11, 2021

Adds explicit narrowing casts to JITLinkMemoryManager.cpp.

Honors -slab-address option in llvm-jitlink.cpp, which was accidentally
dropped in the refactor.

This effectively reverts commit 6641d29b.

962a2479

BPF: rename BTF_KIND_TAG to BTF_KIND_DECL_TAG · 1321e472

Yonghong Song authored Oct 11, 2021

Per discussion in https://reviews.llvm.org/D111199,
the existing btf_tag attribute will be renamed to
btf_decl_tag. This patch updated BTF backend to
use btf_decl_tag attribute name and also
renamed BTF_KIND_TAG to BTF_KIND_DECL_TAG.

Differential Revision: https://reviews.llvm.org/D111592

1321e472

[AMDGPU] Remove dead frame indices after sgpr spill. · 52cb3af0

hsmahesha authored Oct 12, 2021

All those frame indices which are dead after sgpr spill should be removed from
the function frame. Othewise, there is a side effect such as re-mapping of free
frame index ids by the later pass(es) like "stack slot coloring" which in turn
could mess-up with the book keeping of "frame index to VGPR lane".

Reviewed By: cdevadas

Differential Revision: https://reviews.llvm.org/D111150

52cb3af0

[NFC][Attr] rename attribute btf_tag to btf_decl_tag · 325d0007

Yonghong Song authored Oct 11, 2021

Per discussion in https://reviews.llvm.org/D111199,
the existing btf_tag attribute will be renamed to
btf_decl_tag. This patch mostly updated the Bitcode and
DebugInfo test cases with new attribute name.

Differential Revision: https://reviews.llvm.org/D111591

325d0007

[X86][ISel] Lowering llvm.thread.pointer · d57a87ea
Freddy Ye authored Oct 12, 2021
```
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D110681
```
d57a87ea
Revert "[JITLink][ORC] Major JITLinkMemoryManager refactor." · 6641d29b
Lang Hames authored Oct 11, 2021
```
This reverts commit e50aea58 while I
investigate bot failures.
```
6641d29b

[JITLink][ORC] Major JITLinkMemoryManager refactor. · e50aea58

Lang Hames authored Oct 10, 2021

This commit substantially refactors the JITLinkMemoryManager API to: (1) add
asynchronous versions of key operations, (2) give memory manager implementations
full control over link graph address layout, (3) enable more efficient tracking
of allocated memory, and (4) support "allocation actions" and finalize-lifetime
memory.

Together these changes provide a more usable API, and enable more powerful and
efficient memory manager implementations.

To support these changes the JITLinkMemoryManager::Allocation inner class has
been split into two new classes: InFlightAllocation, and FinalizedAllocation.
The allocate method returns an InFlightAllocation that tracks memory (both
working and executor memory) prior to finalization. The finalize method returns
a FinalizedAllocation object, and the InFlightAllocation is discarded. Breaking
Allocation into InFlightAllocation and FinalizedAllocation allows
InFlightAllocation subclassses to be written more naturally, and FinalizedAlloc
to be implemented and used efficiently (see (3) below).

In addition to the memory manager changes this commit also introduces a new
MemProt type to represent memory protections (MemProt replaces use of
sys::Memory::ProtectionFlags in JITLink), and a new MemDeallocPolicy type that
can be used to indicate when a section should be deallocated (see (4) below).

Plugin/pass writers who were using sys::Memory::ProtectionFlags will have to
switch to MemProt -- this should be straightworward. Clients with out-of-tree
memory managers will need to update their implementations. Clients using
in-tree memory managers should mostly be able to ignore it.

Major features:

(1) More asynchrony:

The allocate and deallocate methods are now asynchronous by default, with
synchronous convenience wrappers supplied. The asynchronous versions allow
clients (including JITLink) to request and deallocate memory without blocking.

(2) Improved control over graph address layout:

Instead of a SegmentRequestMap, JITLinkMemoryManager::allocate now takes a
reference to the LinkGraph to be allocated. The memory manager is responsible
for calculating the memory requirements for the graph, and laying out the graph
(setting working and executor memory addresses) within the allocated memory.
This gives memory managers full control over JIT'd memory layout. For clients
that don't need or want this degree of control the new "BasicLayout" utility can
be used to get a segment-based view of the graph, similar to the one provided by
SegmentRequestMap. Once segment addresses are assigned the BasicLayout::apply
method can be used to automatically lay out the graph.

(3) Efficient tracking of allocated memory.

The FinalizedAlloc type is a wrapper for an ExecutorAddr and requires only
64-bits to store in the controller. The meaning of the address held by the
FinalizedAlloc is left up to the memory manager implementation, but the
FinalizedAlloc type enforces a requirement that deallocate be called on any
non-default values prior to destruction. The deallocate method takes a
vector<FinalizedAlloc>, allowing for bulk deallocation of many allocations in a
single call.

Memory manager implementations will typically store the address of some
allocation metadata in the executor in the FinalizedAlloc, as holding this
metadata in the executor is often cheaper and may allow for clean deallocation
even in failure cases where the connection with the controller is lost.

(4) Support for "allocation actions" and finalize-lifetime memory.

Allocation actions are pairs (finalize_act, deallocate_act) of JITTargetAddress
triples (fn, arg_buffer_addr, arg_buffer_size), that can be attached to a
finalize request. At finalization time, after memory protections have been
applied, each of the "finalize_act" elements will be called in order (skipping
any elements whose fn value is zero) as

((char*(*)(const char *, size_t))fn)((const char *)arg_buffer_addr,
                                     (size_t)arg_buffer_size);

At deallocation time the deallocate elements will be run in reverse order (again
skipping any elements where fn is zero).

The returned char * should be null to indicate success, or a non-null
heap-allocated string error message to indicate failure.

These actions allow finalization and deallocation to be extended to include
operations like registering and deregistering eh-frames, TLS sections,
initializer and deinitializers, and language metadata sections. Previously these
operations required separate callWrapper invocations. Compared to callWrapper
invocations, actions require no extra IPC/RPC, reducing costs and eliminating
a potential source of errors.

Finalize lifetime memory can be used to support finalize actions: Sections with
finalize lifetime should be destroyed by memory managers immediately after
finalization actions have been run. Finalize memory can be used to support
finalize actions (e.g. with extra-metadata, or synthesized finalize actions)
without incurring permanent memory overhead.

e50aea58

[AArch64][GlobalISel] Fix combiner assertion in matchConstantOp(). · 53ebfa7c
Amara Emerson authored Oct 11, 2021
```
We shouldn't call APInt::getSExtValue() on a >64b value.
```
53ebfa7c

[TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation · 6599961c

Guozhi Wei authored Oct 11, 2021

This patch contains following enhancements to SrcRegMap and DstRegMap:

  1 In findOnlyInterestingUse not only check if the Reg is two address usage,
    but also check after commutation can it be two address usage.

  2 If a physical register is clobbered, remove SrcRegMap entries that are
    mapped to it.

  3 In processTiedPairs, when create a new COPY instruction, add a SrcRegMap
    entry only when the COPY instruction is coalescable. (The COPY src is
    killed)

With these enhancements isProfitableToCommute can do better commute decision,
and finally more register copies are removed.

Differential Revision: https://reviews.llvm.org/D108731

6599961c

Oct 11, 2021

[LoopSimplifyCFG] Do not require MSSA. Continue to preserve if available. · f7ca5428

Alina Sbirlea authored Oct 11, 2021

LoopSimplifyCFG does not need MSSA, but should preserve it if it's available.

This is a legacy PM change, aimed to denoise the test changes in D109958.

Differential Revision: https://reviews.llvm.org/D111578

f7ca5428

[ORC] Propagate errors to handlers when sendMessage fails. · 17a0858f

Lang Hames authored Oct 11, 2021

In SimpleRemoteEPC, calls to from callWrapperAsync to sendMessage may fail.
The handlers may or may not be sent failure messages by handleDisconnect,
depending on when that method is run. This patch adds a check for an un-failed
handler, and if it finds one sends it a failure message.

17a0858f

[ORC] Destroy FinalizeErr if there is a serialization error. · 4fc2a4cc

Lang Hames authored Oct 11, 2021

If there is a serialization error then FinalizeErr should never be set, so we
can use cantFail rather than consumeError here.

4fc2a4cc

[IVUsers] Check for preheader instead of loop simplify form · 2a2a37d9

Nikita Popov authored Oct 09, 2021

IVUsers currently makes sure that all loops dominating a user are
in loop simplify form, because SCEVExpander needs a preheader to
insert into. However, loop simplify form requires much more than
that. In particular, it requires dedicated exits, which means that
exits need to be found and walked. For large functions with many
nested loops, this can result in pathological compile-time explosion.

Fix this by only checking the property we're actually interested in,
which is incidentally cheap to check.

Differential Revision: https://reviews.llvm.org/D111493

2a2a37d9

[ARM] Be more explicit about disabling CombineBaseUpdate for MVE. · 860b4479

David Green authored Oct 11, 2021

This shouldn't be called for non-neon targets at the moment in either
case, but it is good to be expliit about the CombineBaseUpdate being a
NEON function, not expecting to be run under MVE.

860b4479

[KnownBits] Introduce `countMaxActiveBits()` and use it in a few places · 684cbae8
Roman Lebedev authored Oct 11, 2021

684cbae8

[LCG] Don't skip invalidation of LazyCallGraph if CFG analyses are preserved · 259390de

Arthur Eubanks authored Oct 06, 2021

The CFG being changed and the overall call graph are not related, we can introduce/remove calls without changing the CFG.

Resolves one of the issues in PR51946.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D111275

259390de

[InstCombine] fold signbit check of X | (X -1) · 59441c73

Sanjay Patel authored Oct 11, 2021

There may be some other patterns like this or a generalization,
but this is an example that I noticed would definitely regress
with a planned follow-up to D111410.

https://alive2.llvm.org/ce/z/GVpQDb

59441c73

[SCCP] Properly report changes when changing a pointer argument · fbddf22e

Arthur Eubanks authored Oct 06, 2021

Fixes one of the issues in PR51946.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D111277

fbddf22e

[PHIElimination] Fix accounting for undef uses when updating LiveVariables · edfdce26

Jay Foad authored Oct 11, 2021

PHI elimination updates LiveVariables info as described here:

    // We only need to update the LiveVariables kill of SrcReg if this was the
    // last PHI use of SrcReg to be lowered on this CFG edge and it is not live
    // out of the predecessor. We can also ignore undef sources.

Unfortunately if the last use also happened to be an undef use then it
would fail to update the LiveVariables at all. Fix this by not counting
undef uses in the VRegPHIUse map.

Thanks to Mikael Holmén for the test case!

Differential Revision: https://reviews.llvm.org/D111552

edfdce26

[AMDGPU] Fix copying a machine operand · 2e1ad932

Jay Foad authored Oct 11, 2021

Without this I get:

*** Bad machine code: Instruction has operand with wrong parent set ***
- function:    available_externally_test
- basic block: %bb.0  (0x7dad598)
- instruction: %0:r600_treg32_x = MOV 1, 0, 0, 0, $alu_literal_x, 0, 0, 0, -1, 1, $pred_sel_off, @available_externally, 0

Differential Revision: https://reviews.llvm.org/D111549

2e1ad932

[VPlan] Print live-in backedge taken count as part of plan. · ab33427c

Florian Hahn authored Oct 11, 2021

At the moment, a VPValue is created for the backedge-taken count, which
is used by some recipes. To make it easier to identify the operands of
recipes using the backedge-taken count, print it at the beginning of the
VPlan if it is used.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D111298

ab33427c

[Orc] Handle hangup messages in SimpleRemoteEPC · a6c95063

Stefan Gränitz authored Oct 11, 2021

On the controller-side, handle `Hangup` messages from the executor. The executor passed `Error::success()` or a failure message as payload.

Hangups cause an immediate disconnect of the transport layer. The disconnect function may be called later again and so implementations should be prepared. `FDSimpleRemoteEPCTransport::disconnect()` already has a flag to check that:
https://github.com/llvm/llvm-project/blob/cd1bd95d8707371da0e4f75cd01669c427466931/llvm/lib/ExecutionEngine/Orc/Shared/SimpleRemoteEPCUtils.cpp#L112

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D111527

a6c95063

Revert "Allow signposts to take advantage of deferred string substitution" · 070315d0

Jonas Devlieghere authored Oct 11, 2021

This reverts commits f9aba9a5 and
035217ff.

As explained in the original commit message, this didn't have the
intended effect of improving the common LLDB use case, but still
provided a marginal improvement for the places where LLDB creates a
scoped time with a string literal.

The reason for the revert is that this change pulls in the os/signpost.h
header in Signposts.h. The former transitively includes loader.h, which
contains a series of macro defines that conflict with MachO.h. There are
ways to work around that, but Adrian and I concluded that  none of them
are worth the trade-off in complicating Signposts.h even further.

070315d0

[AMDGPU] Support shared literals in FMAMK/FMAAK · b4b7e605

Joe Nash authored Oct 04, 2021

These instructions should allow src0 to be a literal with the same
value as the mandatory other literal. Enable it by introducing an
operand that defers adding its value to the MI when decoding till
the mandatory literal is parsed.

Reviewed By: dp, foad

Differential Revision: https://reviews.llvm.org/D111067

Change-Id: I22b0ae0d35bad17b6f976808e48bffe9a6af70b7

b4b7e605

[SCEV] Extend trip count to avoid overflow by default · 7f55209c

Philip Reames authored Oct 11, 2021

As a brief reminder, an "exit count" is the number of times the backedge executes before some event. It can be zero if we exit before the backedge is reached. A "trip count" is the number of times the loop header is entered if we branch into the loop. In general, TC = BTC + 1 and thus a zero trip count is ill defined

There is a cornercases which we don't handle well. Let's assume i8 for our examples to keep things simple. If BTC = 255, then the correct trip count is 256. However, 256 is not representable in i8.

In theory, code which needs to reason about trip counts is responsible for checking for this cornercase, and either bailing out, or handling it correctly. Historically, we don't have a great track record about actually doing so.

When reviewing D109676, I found myself asking a basic question. Was there any good reason to preserve the current wrap-to-zero behavior when converting from backedge taken counts to trip counts? After reviewing existing code, I could not find a single case which appears to correctly and precisely handle the overflow case.

This patch changes the default behavior to extend instead of wrap. That is, if the result might be 256, we return a value of i9 type to ensure we interpret the count correctly. I did leave the legacy behavior as an option since a) loop-flatten stops triggering if I extend due to weirdly specific pattern matching I didn't understand and b) we could reasonably use the mode if we'd externally established a lack of overflow.

I want to emphasize that this change is *not* NFC. There are two call sites (one in ScalarEvolution.cpp, one in LoopCacheAnalysis.cpp) which are switched to the extend semantics. The former appears imprecise (but correct) for a constant 255 BTC. The later appears incorrect, though I don't have a test case.

Differential Revision: https://reviews.llvm.org/D110587

7f55209c

[Clang][ARM][AArch64] Add support for Armv9-A, Armv9.1-A and Armv9.2-A · 3550e242

Victor Campos authored Sep 08, 2021

armv9-a, armv9.1-a and armv9.2-a can be targeted using the -march option
both in ARM and AArch64.

 - Armv9-A maps to Armv8.5-A.
 - Armv9.1-A maps to Armv8.6-A.
 - Armv9.2-A maps to Armv8.7-A.
 - The SVE2 extension is enabled by default on these architectures.
 - The cryptographic extensions are disabled by default on these
 architectures.

The Armv9-A architecture is described in the Arm® Architecture Reference
Manual Supplement Armv9, for Armv9-A architecture profile
(https://developer.arm.com/documentation/ddi0608/latest).

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D109517

3550e242