Commits · 5d602120c3a3c175f606d6ce599cfb60239d904c · Lorenzo Albano / LLVM bpEVL

Nov 30, 2021

[AMDGPU] Update docs for nontemporal store · 5d602120

Jay Foad authored Nov 29, 2021

Update the documented GFX10 code sequence for nontemporal stores after
D114351.

Differential Revision: https://reviews.llvm.org/D114707

5d602120

[clang][ARM] PACBTI-M assembly support · 5cff77c2

Ties Stuij authored Nov 30, 2021

Introduce assembly support for Armv8.1-M PACBTI extension. This is an optional
extension in v8.1-M.

There are 10 new system registers and 5 new instructions, all predicated on the
feature.

The attribute for llvm-mc is called "pacbti". For armclang, an architecture
extension also called "pacbti" was created.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Victor Campos
- Ties Stuij

Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D112420

5cff77c2

[mlir] Decompose Bufferization Clone operation into Memref Alloc and Copy. · ae1ea0be

Julian Gross authored Nov 18, 2021

This patch introduces a new conversion to convert bufferization.clone operations
into a memref.alloc and a memref.copy operation. This transformation is needed to
transform all remaining clones which "survive" all previous transformations, before
a given program is lowered further (to LLVM e.g.). Otherwise, these operations
cannot be handled anymore and lead to compile errors.
See: https://llvm.discourse.group/t/bufferization-error-related-to-memref-clone/4665

Differential Revision: https://reviews.llvm.org/D114233

ae1ea0be

[clangd] Make std symbol generation script python3 friendly · 3356d883
Kadir Cetinkaya authored Nov 29, 2021
```
Differential Revision: https://reviews.llvm.org/D114723
```
3356d883

[mlir] Move bufferization-related passes to `bufferization` dialect. · f89bb3c0

Alexander Belyaev authored Nov 29, 2021

[RFC](https://llvm.discourse.group/t/rfc-dialect-for-bufferization-related-ops/4712)

Differential Revision: https://reviews.llvm.org/D114698

f89bb3c0

[mlir][OpDSL] Fix OpDSL tests after https://reviews.llvm.org/D114680 . · 0d0371f5

gysit authored Nov 30, 2021

Update the shapes of the convolution / pooling tests that where detected after enabling verification during printing (https://reviews.llvm.org/D114680). Also split the emit_structured_generic.py file that previously contained all tests into multiple separate files to simplify debugging.

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D114731

0d0371f5

[ELF] Move ObjFile<ELFT>::{getLocalSymbols,getGlobalSymbols} to non-template ELFFileBase. NFC · 5188f55d
Fangrui Song authored Nov 30, 2021

5188f55d

[RISCV] Decode vtype with reserved fields to raw immediate · 29d4230d

Ben Shi authored Nov 25, 2021

This patch fixes a crash when doing "llvm-objdump -D --mattr=+experimental-v"
against an object file which happens to keep a word that can be decoded to
VSETVLI & VSETIVLI with reserved vlmul[2:0]=4. All vtype values with
reserved fields (vlmul[2:0]=4, vsew[2:0]=0b1xx, non-zero bits 8/9/10) are
printed to raw immediate.

Reviewed By: jhenderson, jrtc27, craig.topper

Differential Revision: https://reviews.llvm.org/D114581

29d4230d

[PR52549][clang-cl] Predefine _MSVC_EXECUTION_CHARACTER_SET · 7ba70d32

Markus Böck authored Nov 30, 2021

Since VS 2022 17.1 MSVC predefines _MSVC_EXECUTION_CHARACTER_SET to inform the users of the execution character set defined at compile time. The value the macro expands to is a Windows Code Page Identifier which are documented here: https://docs.microsoft.com/en-us/windows/win32/intl/code-page-identifiers

As clang currently only supports UTF-8 it is defined as 65001. If clang-cl were to support a different execution character set in the future we'd have to change the value.

Fixes https://bugs.llvm.org/show_bug.cgi?id=52549

Differential Revision: https://reviews.llvm.org/D114576

7ba70d32

[llvm-profgen] Compute and show profile density · c2e08aba

wlei authored Nov 28, 2021

AutoFDO performance is sensitive to profile density, i.e., the amount of samples in the profile relative to the program size, because profiles with insufficient samples could be inaccurate due to statistical noise and thus hurt AutoFDO performance. A previous investigation showed that AutoFDO performed better on MySQL with increased amount of samples. Therefore, we implement a profile-density computation feature to give hints about profile density to users and the compiler.

We define the density of a profile Prof as follows:

- For each function A in the profile, density(A) = total_samples(A) / sizeof(A).
- density(Prof) = min(density(A)) for all functions A that are warm (defined below).

A function is considered warm if its total-samples is within top N percent of the profile. For implementation, we reuse the `ProfileSummaryBuilder::getHotCountThreshold(..)` as threshold which can be set by percent(`--profile-summary-cutoff-hot`) or by value(`--profile-summary-hot-count`).

We also introduce `--hot-function-density-threshold` to set hot function density threshold and will give suggestion if profile density is below it which implies we should increase samples.

This also applies for CS profile with all profiles merged into base.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D113781

c2e08aba

[X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()` · 8cd78248

Roman Lebedev authored Nov 30, 2021

We ask `TTI.getAddressComputationCost()` about the cost of computing vector address,
and then multiply it by the vector width. This doesn't make any sense,
it implies that we'd do a vector GEP and then scalarize the vector of pointers,
but there is no such thing in the vectorized IR, we perform scalar GEP's.

This is *especially* bad on X86, and was effectively prohibiting any scalarized
vectorization of gathers/scatters, because `X86TTIImpl::getAddressComputationCost()`
says that cost of vector address computation is `10` as compared to `1` for scalar.

The computed costs are similar to the ones with D111222+D111220,
but we end up without masked memory intrinsics that we'd then have to
expand later on, without much luck. (D111363)

Differential Revision: https://reviews.llvm.org/D111460

8cd78248

[ARM] create new pseudo t2LDRLIT_ga_pcrel for stack guards · 89453ed6

Nick Desaulniers authored Nov 30, 2021

We can't use the existing pseudo ARM::tLDRLIT_ga_pcrel for loading the
stack guard for PIC code that references the GOT, since arm-pseudo may
expand this to the narrow tLDRpci rather than the wider t2LDRpci.

Create a new pseudo, t2LDRLIT_ga_pcrel, and expand it to t2LDRpci.

Fixes: https://bugs.chromium.org/p/chromium/issues/detail?id=1270361

Reviewed By: ardb

Differential Revision: https://reviews.llvm.org/D114762

89453ed6

[clang-tidy] Warn on functional C-style casts · 5bbe5014

Carlos Galvez authored Nov 23, 2021

The google-readability-casting check is meant to be on par
with cpplint's readability/casting check, according to the
documentation. However it currently does not diagnose
functional casts, like:

float x = 1.5F;
int y = int(x);

This is detected by cpplint, however, and the guidelines
are clear that such a cast is only allowed when the type
is a class type (constructor call):

> You may use cast formats like `T(x)` only when `T` is a class type.

Therefore, update the clang-tidy check to check this
case.

Differential Revision: https://reviews.llvm.org/D114427

5bbe5014

[ELF] Move GOT/PLT relocation code closer. NFC · 5047e3a3
Fangrui Song authored Nov 29, 2021

5047e3a3

[X86][clang] Enable floating-point type for -mno-x87 option on 32-bits · 42c15c7e

Phoebe Wang authored Nov 30, 2021

We should match GCC's behavior which allows floating-point type for -mno-x87 option on 32-bits. https://godbolt.org/z/KrbhfWc9o

The previous block issues have partially been fixed by D112143.

Reviewed By: asavonic, nickdesaulniers

Differential Revision: https://reviews.llvm.org/D114162

42c15c7e

[mlir][python] Audit and fix a lot of the Python pyi stubs. · a88bb5b9

Stella Laurenzo authored Nov 29, 2021

* Classes that are still todo are marked with "# TODO: Auto-generated. Audit and fix."
* Those without this note have been cross-checked with C++ sources and most have been spot checked by hovering in VsCode.

Differential Revision: https://reviews.llvm.org/D114767

a88bb5b9

[mlir][python] Implement more SymbolTable methods. · bdc31837

Stella Laurenzo authored Nov 28, 2021

* set_symbol_name, get_symbol_name, set_visibility, get_visibility, replace_all_symbol_uses, walk_symbol_tables
* In integrations I've been doing, I've been reaching for all of these to do both general IR manipulation and module merging.
* I don't love the replace_all_symbol_uses underlying APIs since they necessitate SYMBOL_COUNT walks and have various sharp edges. I'm hoping that whatever emerges eventually for this can still retain this simple API as a one-shot.

Differential Revision: https://reviews.llvm.org/D114687

bdc31837

[mlir][python] Add pyi stub files to enable auto completion. · a6e7d024

Stella Laurenzo authored Nov 28, 2021

There is no completely automated facility for generating stubs that are both accurate and comprehensive for native modules. After some experimentation, I found that MyPy's stubgen does the best at generating correct stubs with a few caveats that are relatively easy to fix:
* Some types resolve to cross module symbols incorrectly.
* staticmethod and classmethod signatures seem to always be completely generic and need to be manually provided.
* It does not generate an __all__ which, from testing, causes namespace pollution to be visible to IDE code completion.

As a first step, I did the following:
* Ran `stubgen` for `_mlir.ir`, `_mlir.passmanager`, and `_mlirExecutionEngine`.
* Manually looked for all instances where unnamed arguments were being emitted (i.e. as 'arg0', etc) and updated the C++ side to include names (and re-ran stubgen to get a good initial state).
* Made/noted a few structural changes to each `pyi` file to make it minimally functional.
* Added the `pyi` files to the CMake rules so they are installed and visible.

To test, I added a `.env` file to the root of the project with `PYTHONPATH=...` set as per instructions. Then reload the developer window (in VsCode) and verify that completion works for various changes to test cases.

There are still a number of overly generic signatures, but I want to check in this low-touch baseline before iterating on more ambiguous changes. This is already a big improvement.

Differential Revision: https://reviews.llvm.org/D114679

a6e7d024

[DebugInfo] Do not replace existing nodes from DICompileUnit · 0150645b

Ellis Hoag authored Nov 29, 2021

When creating a new DIBuilder with an existing DICompileUnit, load the
DINodes from the current DICompileUnit so they don't get overwritten.
This is done in the MachineOutliner pass, but it didn't change the CU so
the bug never appeared. We need this if we ever want to add DINodes to
the CU after it has been created, e.g., DIGlobalVariables.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D114556

0150645b

[AMDGPU] Enable copy between VGPR and AGPR classes during regalloc · 5297cbf0

Christudasan Devadasan authored Sep 06, 2021

Greedy register allocator prefers to move a constrained
live range into a larger allocatable class over spilling
them. This patch defines the necessary superclasses for
vector registers. For subtargets that support copy between
VGPRs and AGPRs, the vector register spills during regalloc
now become just copies.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D109301

5297cbf0

[TwoAddressInstructionPass] Create register mapping for registers with... · f1d8345a

Guozhi Wei authored Nov 29, 2021

[TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB

Currently we create register mappings for registers used only once in current
MBB. For registers with multiple uses, when all the uses are in the current MBB,
we can also create mappings for them similarly according to the last use.
For example

    %reg101 = ...
            = ... reg101
    %reg103 = ADD %reg101, %reg102

We can create mapping between %reg101 and %reg103.

Differential Revision: https://reviews.llvm.org/D113193

f1d8345a

[RISCV] Promote f16 log/pow/exp/sin/cos/etc. to f32 libcalls. · b121d23a

Craig Topper authored Nov 29, 2021

Prevents crashes or cannot select errors.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D113822

b121d23a

[NFC][sanitizer] Track progress of populating the block · a06d3527

Vitaly Buka authored Nov 23, 2021

In multi-threaded application concurrent StackStore::Store may
finish in order different from assigned Id. So we can't assume
that after we switch writing the next block the previous is done.

The workaround is to count exact number of uptr stored into the block,
including skipped tail/head which were not able to fit entire trace.

Depends on D114490.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D114493

a06d3527

[RISCV] Fix a bug in RISCVFrameLowering. · 9a885665

Hsiangkai Wang authored Nov 19, 2021

When we have out-going arguments passing through stack and we do not
reserve the stack space in the prologue. Use BP to access stack objects
after adjusting the stack pointer before function calls.

callseq_start  ->  sp = sp - reserved_space
//
// Use FP to access fixed stack objects.
// Use BP to access non-fixed stack objects.
//
call @foo
callseq_end    ->  sp = sp + reserved_space

Differential Revision: https://reviews.llvm.org/D114246

9a885665

[RISCV] Add a test case to show the bug in RISCVFrameLowering. · 4ae2222e

Hsiangkai Wang authored Nov 19, 2021

If the number of arguments is too large to use register passing, it
needs to occupy stack space to pass the arguments to the callee. There
are two scenarios. One is to reserve the space in prologue and the other
is to reserve the space before the function calls. When we need to
reserve the stack space before function calls, the stack pointer is
adjusted. Under the scenario, we should not use stack pointer to access
the stack objects. It looks like,

callseq_start  ->  sp = sp - reserved_space
//
// We should not use SP to access stack objects in this area.
//
call @foo
callseq_end    ->  sp = sp + reserved_space

Differential Revision: https://reviews.llvm.org/D114245

4ae2222e

[NFC] Header comment in X86RegisterBanks.td referred to Aarch64 · fde93774
Mircea Trofin authored Nov 29, 2021
```
Differential Revision: https://reviews.llvm.org/D114763
```
fde93774

[sanitizer] Add Leb128 encoding/decoding · 25a7e4b9

Vitaly Buka authored Nov 21, 2021

Reviewed By: dvyukov, kstoimenov

Differential Revision: https://reviews.llvm.org/D114464

25a7e4b9

Revert "[lldb][NFC] Format lldb/include/lldb/Symbol/Type.h" · 2e5c47ed
Luís Ferreira authored Nov 30, 2021
```
This reverts commit 6f99e1aa.
```
2e5c47ed
Add missing header · bd4c6a47
David Blaikie authored Nov 29, 2021

bd4c6a47

[mlir][sparse] generalize sparse tensor output implementation · 7d4da4e1

Aart Bik authored Nov 22, 2021

Moves sparse tensor output support forward by generalizing from injective
insertions only to include reductions. This revision accepts the case with all
parallel outer and all reduction inner loops, since that can be handled with
an injective insertion still. Next revision will allow the inner parallel loop
to move inward (but that will require "access pattern expansion" aka "workspace").

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D114399

7d4da4e1

X86: Fold masked-merge when and-not is not available · 87ba99c2
Matthias Braun authored Oct 15, 2021
```
Differential Revision: https://reviews.llvm.org/D112754
```
87ba99c2
Tests for D112754 · 53dfa525
Matthias Braun authored Nov 03, 2021
```
Differential Revision: https://reviews.llvm.org/D113151
```
53dfa525

[Demangle] Add support for D anonymous symbols · b779f02a

Luís Ferreira authored Nov 29, 2021

    Anonymous symbols are represented by 0 in the mangled symbol. We should skip
    them in order to represent the demangled name correctly, otherwise demangled
    names like `demangle..anon` can happen.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D114307

b779f02a

[Demangle] Add support for multiple identifiers in D qualified names · 6e08abdc
David Blaikie authored Nov 29, 2021
```
Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D114305
```
6e08abdc

[Demangle] Add support for D simple single qualified names · e63c799a

David Blaikie authored Nov 29, 2021

    This patch adds support for simple single qualified names that includes
    internal mangled names and normal symbol names.

Differential Revision: https://reviews.llvm.org/D111415

e63c799a

[NFC][Regalloc] Split canEvictInterference into hint and general · e8b8304d

Mircea Trofin authored Nov 16, 2021

There are 2 eviction queries. One is made by tryAssign, when it attempts to
free an interference occupying the hint of the candidate. The other is
during 'regular' interference resolution, where we scan over all
physical registers and try to see if we can evict live ranges in favor
of the candidate. We currently use the same logic in both cases, just
that the former never passes the cost to any subsequent query.
Technically, the 2 decisions could be implemented with different
policies.

This patch splits the 2.

RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html

Differential Revision: https://reviews.llvm.org/D114019

e8b8304d

[lldb][NFC] Format lldb/include/lldb/Symbol/Type.h · 6f99e1aa

Luís Ferreira authored Nov 29, 2021

Reviewed By: teemperor, JDevlieghere

Differential Revision: https://reviews.llvm.org/D113604



Signed-off-by: Luís Ferreira <contact@lsferreira.net>

6f99e1aa

[openmp][devicertl] Add a missing loader_uninitialized attribute · 3ab150f6
Jon Chesterfield authored Nov 29, 2021

3ab150f6

[clang-tidy] Fix pr48613: "llvm-header-guard uses a reserved identifier" · c7aa3587

Salman Javed authored Nov 30, 2021

Fixes https://bugs.llvm.org/show_bug.cgi?id=48613.

llvm-header-guard is suggesting header guards with leading underscores
if the header file path begins with a '/' or similar special character.
Only reserved identifiers should begin with an underscore.

Differential Revision: https://reviews.llvm.org/D114149

c7aa3587

[DebugInfo][InstrRef] Terminate overlapping variable fragments · 0eee8445

Jeremy Morse authored Nov 29, 2021

If we have a variable where its fragments are split into overlapping
segments:

    DBG_VALUE $ax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 16)
    ...
    DBG_VALUE $eax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 32)

we should only propagate the most recently assigned fragment out of a
block. LiveDebugValues only deals with live-in variable locations, as
overlaps within blocks is DbgEntityHistoryCalculators domain.

InstrRefBasedLDV has kept the accumulateFragmentMap method from
VarLocBasedLDV, we just need it to recognise DBG_INSTR_REFs. Once it's
produced a mapping of variable / fragments to the overlapped variable /
fragments, VLocTracker uses it to identify when a debug instruction needs
to terminate the other parts it overlaps with. The test is updated for
some standard "InstrRef picks different registers" variation, and the
order of some unrelated DBG_VALUEs changes.

Differential Revision: https://reviews.llvm.org/D114603

0eee8445