Commits · 1112b7bad851c78af2159d39c7bff670a9c77da1 · Lorenzo Albano / LLVM bpEVL

Oct 12, 2021

Revert "[AMDGPU] Enable load clustering in the post-RA scheduler" · 66ce1015
Jay Foad authored Oct 12, 2021
```
This reverts commit 66e13c7f.

It was committed by accident.
```
66ce1015

[TwoAddressInstruction] Remove ad hoc machine verification · f7ee21aa

Jay Foad authored Oct 12, 2021

With the -early-live-intervals command line flag,
TwoAddressInstructionPass::runOnMachineFunction would call
MachineFunction::verify before returning to check the live intervals.
But there was not much benefit to doing this since -verify-machineinstrs
and LLVM_ENABLE_EXPENSIVE_CHECKS provide a more general way of
scheduling machine verification after every pass.

Also it caused problems on targets like Lanai which are marked as "not
machine verifier clean", since verification would fail for known
target-specific problems which are nothing to do with LiveIntervals.

Differential Revision: https://reviews.llvm.org/D111618

f7ee21aa

[AMDGPU] Enable load clustering in the post-RA scheduler · 66e13c7f

Jay Foad authored Oct 12, 2021

This has a couple of benefits:
1. It can sometimes fix clusters that got broken apart when the register
   allocator inserted a copy.
2. Post-RA scheduling does not have to worry about increasing register
   pressure, which in some cases gives it more freedom to reorder
   instructions.

Testing on a collection of 10,000 graphics shaders compiled for gfx1010
showed:
- The average length of each run of one or more load instructions
  increased by about 1%.
- The number of runs of two or more load instructions increased by
  about 4%.

66e13c7f

[DebugInfo][NFC] Move LiveDebugValues class to header · 838b4a53

Jeremy Morse authored Oct 12, 2021

This patch shifts the InstrRefBasedLDV class declaration to a header.
Partially because it's already massive, but mostly so that I can start
writing some unit tests for it. This patch also adds the boilerplate for
said unit tests.

Differential Revision: https://reviews.llvm.org/D110165

838b4a53

[AArch64][SVE] Add fixed type lowering for EXTRACT_SUBVECTOR · 2eb42e3d
Bradley Smith authored Oct 05, 2021
```
Depends on D111135

Differential Revision: https://reviews.llvm.org/D111165
```
2eb42e3d
[X86] Fix implicit MathsExtras.h header dependency · 61d124f7
Simon Pilgrim authored Oct 12, 2021

61d124f7

[LoopVectorize] Classify pointer induction updates as scalar only if they have one use · 1439ef1a

Kerry McLaughlin authored Oct 11, 2021

collectLoopScalars collects pointer induction updates in ScalarPtrs, assuming
that the instruction will be scalar after vectorization. This may crash later
in VPReplicateRecipe::execute() if there there is another user of the instruction
other than the Phi node which needs to be widened.

This changes collectLoopScalars so that if there are any other users of
Update other than a Phi node, it is not added to ScalarPtrs.

Reviewed By: david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D111294

1439ef1a

[LoopPeel] Use any_of & contains instead of for & find. · 40d85f16
Florian Hahn authored Oct 12, 2021
```
Using contains was suggested in D108114, but I forgot to include it when
landing the patch.
```
40d85f16
[gn build] Port f4c1258d · 269d0e22
LLVM GN Syncbot authored Oct 12, 2021

269d0e22

[FuncSpec] Allow ConstExprs that are function pointers · fc0fa851

Sjoerd Meijer authored Oct 12, 2021

This is a follow up of D110529 that disallowed constexprs. That change
introduced a regression as this also disallowed constexprs that are function
pointers, which is actually one of the motivating use cases that we do want to
support.

Differential Revision: https://reviews.llvm.org/D111567

fc0fa851

[LoopPeel] Peel if it turns invariant loads dereferenceable. · cd0ba9dc

Florian Hahn authored Oct 12, 2021

This patch adds a new cost heuristic that allows peeling a single
iteration off read-only loops, if the loop contains a load that

    1. is feeding an exit condition,
    2. dominates the latch,
    3. is not already known to be dereferenceable,
    4. and has a loop invariant address.

If all non-latch exits are terminated with unreachable, such loads
in the loop are guaranteed to be dereferenceable after peeling,
enabling hoisting/CSE'ing them.

This enables vectorization of loops with certain runtime-checks, like
multiple calls to `std::vector::at` if the vector is passed as pointer.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D108114

cd0ba9dc

[gn build] (manually) port f4c1258d · e19bbd0f
Nico Weber authored Oct 12, 2021

e19bbd0f

[SelectionDAG] Fix typo in VPLoadStoreSDNode · e2d5a380

Roger Ferrer Ibanez authored Oct 12, 2021

There is no code that uses this base class yet, hence the typo went
unnoticed when this class was added in D105871

Differential Revision: https://reviews.llvm.org/D110930

e2d5a380

Pre-commit pre-inc-disable.ll to avoid dead code · 1f253e4f

Qiu Chaofan authored Oct 12, 2021

The case was added in 728e1397, testing it outputs lxsibzx instead of
lbzux. Here we need some minimal update to avoid DCE in future patches.

1f253e4f

[docs] List support for Armv9-A, Armv9.1-A and Armv9.2-A in LLVM and Clang · 3e7cf33a
Victor Campos authored Oct 12, 2021
```
Reviewed By: pratlucas

Differential Revision: https://reviews.llvm.org/D110241
```
3e7cf33a

[RISCV] Rename assembler mnemonic of unordered floating-point reductions for v1.0-rc change · 0608bbd4

jacquesguan authored Oct 12, 2021

Rename vfredsum and vfwredsum to vfredusum and vfwredusum. Add aliases for vfredsum and vfwredsum.

Reviewed By: luismarques, HsiangKai, khchen, frasercrmck, kito-cheng, craig.topper

Differential Revision: https://reviews.llvm.org/D105690

0608bbd4

[ORC] More attempts to work around compiler failures. · 5829ba7a

Lang Hames authored Oct 11, 2021

Commit 731f991c seems to have helped, but did not catch all instances (see
https://lab.llvm.org/buildbot/#/builders/193/builds/104). Switch more inner
structs to C++98 initializers to work around the issue. Add FIXMEs to revisit
in the future.

5829ba7a

[ORC] Attempt to work around compile failure on some bots. · 731f991c

Lang Hames authored Oct 11, 2021

See e.g. https://lab.llvm.org/buildbot/#/builders/193/builds/98.

I think this failure is related to a C++ standard defect, 1397 --"Class
completeness in non-static data member initializers" [1]. If so, moving
to C++98 initialization should work around the issue.

[1] http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1397

731f991c

[NFC][LangRef] Update description for FuncFlags · ef643617

modimo authored Oct 11, 2021

Add the additional flags from D36850 as well as noInline/alwaysInline from previous changes.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D111600

ef643617

[ORC] Add more explicit narrowing casts. · 3a52a639

Lang Hames authored Oct 11, 2021

This should fix the buildbot failure at
https://lab.llvm.org/buildbot/#/builders/187/builds/2140

3a52a639

[gn build] Port 962a2479 · db832d46
LLVM GN Syncbot authored Oct 12, 2021

db832d46
[ORC] Fix a typo in a variable name. · 9ca50641
Lang Hames authored Oct 11, 2021

9ca50641

Re-apply , "Major JITLinkMemoryManager refactor". with fixes. · 962a2479

Lang Hames authored Oct 11, 2021

Adds explicit narrowing casts to JITLinkMemoryManager.cpp.

Honors -slab-address option in llvm-jitlink.cpp, which was accidentally
dropped in the refactor.

This effectively reverts commit 6641d29b.

962a2479

BPF: rename BTF_KIND_TAG to BTF_KIND_DECL_TAG · 1321e472

Yonghong Song authored Oct 11, 2021

Per discussion in https://reviews.llvm.org/D111199,
the existing btf_tag attribute will be renamed to
btf_decl_tag. This patch updated BTF backend to
use btf_decl_tag attribute name and also
renamed BTF_KIND_TAG to BTF_KIND_DECL_TAG.

Differential Revision: https://reviews.llvm.org/D111592

1321e472

[AMDGPU] Remove dead frame indices after sgpr spill. · 52cb3af0

hsmahesha authored Oct 12, 2021

All those frame indices which are dead after sgpr spill should be removed from
the function frame. Othewise, there is a side effect such as re-mapping of free
frame index ids by the later pass(es) like "stack slot coloring" which in turn
could mess-up with the book keeping of "frame index to VGPR lane".

Reviewed By: cdevadas

Differential Revision: https://reviews.llvm.org/D111150

52cb3af0

[NFC][Attr] rename attribute btf_tag to btf_decl_tag · 325d0007

Yonghong Song authored Oct 11, 2021

Per discussion in https://reviews.llvm.org/D111199,
the existing btf_tag attribute will be renamed to
btf_decl_tag. This patch mostly updated the Bitcode and
DebugInfo test cases with new attribute name.

Differential Revision: https://reviews.llvm.org/D111591

325d0007

[llvm-jitlink] Fix a broken warning. · b7c1ccd4
Lang Hames authored Oct 11, 2021
```
This warning should only be issued if -slab-page-size has not been used.
```
b7c1ccd4
[X86][ISel] Lowering llvm.thread.pointer · d57a87ea
Freddy Ye authored Oct 12, 2021
```
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D110681
```
d57a87ea
Revert "[JITLink][ORC] Major JITLinkMemoryManager refactor." · 6641d29b
Lang Hames authored Oct 11, 2021
```
This reverts commit e50aea58 while I
investigate bot failures.
```
6641d29b

[JITLink][ORC] Major JITLinkMemoryManager refactor. · e50aea58

Lang Hames authored Oct 10, 2021

This commit substantially refactors the JITLinkMemoryManager API to: (1) add
asynchronous versions of key operations, (2) give memory manager implementations
full control over link graph address layout, (3) enable more efficient tracking
of allocated memory, and (4) support "allocation actions" and finalize-lifetime
memory.

Together these changes provide a more usable API, and enable more powerful and
efficient memory manager implementations.

To support these changes the JITLinkMemoryManager::Allocation inner class has
been split into two new classes: InFlightAllocation, and FinalizedAllocation.
The allocate method returns an InFlightAllocation that tracks memory (both
working and executor memory) prior to finalization. The finalize method returns
a FinalizedAllocation object, and the InFlightAllocation is discarded. Breaking
Allocation into InFlightAllocation and FinalizedAllocation allows
InFlightAllocation subclassses to be written more naturally, and FinalizedAlloc
to be implemented and used efficiently (see (3) below).

In addition to the memory manager changes this commit also introduces a new
MemProt type to represent memory protections (MemProt replaces use of
sys::Memory::ProtectionFlags in JITLink), and a new MemDeallocPolicy type that
can be used to indicate when a section should be deallocated (see (4) below).

Plugin/pass writers who were using sys::Memory::ProtectionFlags will have to
switch to MemProt -- this should be straightworward. Clients with out-of-tree
memory managers will need to update their implementations. Clients using
in-tree memory managers should mostly be able to ignore it.

Major features:

(1) More asynchrony:

The allocate and deallocate methods are now asynchronous by default, with
synchronous convenience wrappers supplied. The asynchronous versions allow
clients (including JITLink) to request and deallocate memory without blocking.

(2) Improved control over graph address layout:

Instead of a SegmentRequestMap, JITLinkMemoryManager::allocate now takes a
reference to the LinkGraph to be allocated. The memory manager is responsible
for calculating the memory requirements for the graph, and laying out the graph
(setting working and executor memory addresses) within the allocated memory.
This gives memory managers full control over JIT'd memory layout. For clients
that don't need or want this degree of control the new "BasicLayout" utility can
be used to get a segment-based view of the graph, similar to the one provided by
SegmentRequestMap. Once segment addresses are assigned the BasicLayout::apply
method can be used to automatically lay out the graph.

(3) Efficient tracking of allocated memory.

The FinalizedAlloc type is a wrapper for an ExecutorAddr and requires only
64-bits to store in the controller. The meaning of the address held by the
FinalizedAlloc is left up to the memory manager implementation, but the
FinalizedAlloc type enforces a requirement that deallocate be called on any
non-default values prior to destruction. The deallocate method takes a
vector<FinalizedAlloc>, allowing for bulk deallocation of many allocations in a
single call.

Memory manager implementations will typically store the address of some
allocation metadata in the executor in the FinalizedAlloc, as holding this
metadata in the executor is often cheaper and may allow for clean deallocation
even in failure cases where the connection with the controller is lost.

(4) Support for "allocation actions" and finalize-lifetime memory.

Allocation actions are pairs (finalize_act, deallocate_act) of JITTargetAddress
triples (fn, arg_buffer_addr, arg_buffer_size), that can be attached to a
finalize request. At finalization time, after memory protections have been
applied, each of the "finalize_act" elements will be called in order (skipping
any elements whose fn value is zero) as

((char*(*)(const char *, size_t))fn)((const char *)arg_buffer_addr,
                                     (size_t)arg_buffer_size);

At deallocation time the deallocate elements will be run in reverse order (again
skipping any elements where fn is zero).

The returned char * should be null to indicate success, or a non-null
heap-allocated string error message to indicate failure.

These actions allow finalization and deallocation to be extended to include
operations like registering and deregistering eh-frames, TLS sections,
initializer and deinitializers, and language metadata sections. Previously these
operations required separate callWrapper invocations. Compared to callWrapper
invocations, actions require no extra IPC/RPC, reducing costs and eliminating
a potential source of errors.

Finalize lifetime memory can be used to support finalize actions: Sections with
finalize lifetime should be destroyed by memory managers immediately after
finalization actions have been run. Finalize memory can be used to support
finalize actions (e.g. with extra-metadata, or synthesized finalize actions)
without incurring permanent memory overhead.

e50aea58

LLVM_ATTRIBUTE_NODEBUG: GCC 4.0 apparently had ((nodebug)) but removed it. · a185d513
Chris Lattner authored Oct 11, 2021
```
This should fix a bunch of warnings on the flang-aarch64-latest-gcc builder.
```
a185d513
Revert "Remove checks for old gcc versions for LLVM_ATTRIBUTE_*" · 627224c9
Arthur Eubanks authored Oct 11, 2021
```
This reverts commit f5b52453.

Breaks bots, e.g. https://lab.llvm.org/buildbot/#/builders/169/builds/3147
```
627224c9

Remove checks for old gcc versions for LLVM_ATTRIBUTE_* · f5b52453

Arthur Eubanks authored Oct 11, 2021

According to [1] we only support gcc 5.1+. So these checks for older gcc versions are not supported.

[1] https://llvm.org/docs/GettingStarted.html#host-c-toolchain-both-compiler-and-standard-library

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D111581

f5b52453

[RISCV][test] Add more tests of immediate materialisation · c9db5f0f
Ben Shi authored Oct 11, 2021
```
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111483
```
c9db5f0f
[AArch64][GlobalISel] Fix combiner assertion in matchConstantOp(). · 53ebfa7c
Amara Emerson authored Oct 11, 2021
```
We shouldn't call APInt::getSExtValue() on a >64b value.
```
53ebfa7c

[TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation · 6599961c

Guozhi Wei authored Oct 11, 2021

This patch contains following enhancements to SrcRegMap and DstRegMap:

  1 In findOnlyInterestingUse not only check if the Reg is two address usage,
    but also check after commutation can it be two address usage.

  2 If a physical register is clobbered, remove SrcRegMap entries that are
    mapped to it.

  3 In processTiedPairs, when create a new COPY instruction, add a SrcRegMap
    entry only when the COPY instruction is coalescable. (The COPY src is
    killed)

With these enhancements isProfitableToCommute can do better commute decision,
and finally more register copies are removed.

Differential Revision: https://reviews.llvm.org/D108731

6599961c

Oct 11, 2021

[GlobalISel] Regenerate some MIR tests with CHECK-NEXT for another patch. · da904719
Amara Emerson authored Oct 11, 2021

da904719

[LoopSimplifyCFG] Do not require MSSA. Continue to preserve if available. · f7ca5428

Alina Sbirlea authored Oct 11, 2021

LoopSimplifyCFG does not need MSSA, but should preserve it if it's available.

This is a legacy PM change, aimed to denoise the test changes in D109958.

Differential Revision: https://reviews.llvm.org/D111578

f7ca5428

[ORC] Propagate errors to handlers when sendMessage fails. · 17a0858f

Lang Hames authored Oct 11, 2021

In SimpleRemoteEPC, calls to from callWrapperAsync to sendMessage may fail.
The handlers may or may not be sent failure messages by handleDisconnect,
depending on when that method is run. This patch adds a check for an un-failed
handler, and if it finds one sends it a failure message.

17a0858f

[ORC] Destroy FinalizeErr if there is a serialization error. · 4fc2a4cc

Lang Hames authored Oct 11, 2021

If there is a serialization error then FinalizeErr should never be set, so we
can use cantFail rather than consumeError here.

4fc2a4cc