Commits · aeeefe97c686dd51e2614d2a3e4214701c57371b · Roger Ferrer / llvm-epi

Oct 14, 2021

[bazel] Move MC header usage from Support to tblgen · aeeefe97

Reid Kleckner authored Oct 14, 2021

After the TargetRegistry.h move, nothing in Support includes headers
from MC. However, files in tablegen use MC headers, so we must add an
entry for them in tblgen srcs.

Differential Revision: https://reviews.llvm.org/D111835

aeeefe97

[test] Fix asan dynamic unit tests with per-target runtime dirs · 8c66d781

Collin Baker authored Oct 14, 2021

When LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=on
Asan-i386-calls-Dynamic-Test and Asan-i386-inline-Dynamic-Test fail to
run on a x86_64 host. This is because asan's unit test lit files are
configured once, rather than per target arch as with the non-unit
tests. LD_LIBRARY_PATH ends up incorrect, and the tests try linking
against the x86_64 runtime which fails.

This changes the unit test CMake machinery to configure the default
and dynamic unit tests once per target arch, similar to the other asan
tests. Then the fix from https://reviews.llvm.org/D108859 is adapted
to the unit test Lit files with some modifications.

Fixes PR52158.

Differential Revision: https://reviews.llvm.org/D111756

8c66d781

[clang] Support -clear-ast-before-backend without -disable-free · d0a5f61c

Arthur Eubanks authored Oct 13, 2021

Previously without -disable-free, -clear-ast-before-backend would crash in ~ASTContext() due to various reasons.
This works around that by doing a lot of the cleanup ahead of the destructor so that the destructor doesn't actually do any manual cleanup if we've already cleaned up beforehand.

This actually does save a measurable amount of memory with -clear-ast-before-backend, although at an almost unnoticeable runtime cost:
https://llvm-compile-time-tracker.com/compare.php?from=5d755b32f2775b9219f6d6e2feda5e1417dc993b&to=58ef1c7ad7e2ad45f9c97597905a8cf05a26258c&stat=max-rss

Previously we weren't doing any cleanup with -disable-free, so I tried measuring the impact of always doing the cleanup and didn't measure anything noticeable on llvm-compile-time-tracker.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D111767

d0a5f61c

[TableGen][PGO] Disable profile instrumentation for printInstruction function · 21abe212

Rong Xu authored Oct 14, 2021

We are seeing extremely long time in building AMDGPUInstPrinter.cpp
when profile instrumentation is enabled: It takes more than 5 minutes
(compared to ~8 seconds in non-instrument build).

This caused by the huge statements in printInstruction functions. In
profile instrumentation build, we need have extra control flow to
differentiate each case statement. This in turn adds significant
compile time in block placement and branch folding.

Function printInstruction is not likely to benefit from PGO build
as it's rarely executed in a typical compilation. So here I disable
the profile instrumentation for this function.

Differential Revision: https://reviews.llvm.org/D111682

21abe212

[MLIR][arith] fix references to std.constant in comments · cb3aa49e
Mogball authored Oct 14, 2021
```
Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D111820
```
cb3aa49e

[mlir][vector] Refactor linalg vectorization for reductions · afad0cdf

thomasraoux authored Oct 14, 2021

Emit reduction during op vectorization instead of doing it when creating the
transfer write. This allow us to not broadcast output arguments for reduction
initial value.

Differential Revision: https://reviews.llvm.org/D111825

afad0cdf

[tests] Add indvars tests showing missing transforms with small IVs · 8b31f07c

Philip Reames authored Oct 14, 2021

This shows the transform side of D109457, but also lets us try other approaches to the same problem.  The common trend to all is that we need to explicit reason about UB to disallow possibility of infinite loops.

8b31f07c

[AArch64] Add extra tests for fptosisat vector variants · e9e6266c
David Green authored Oct 14, 2021

e9e6266c

[X86][Costmodel] Improve cost modelling for not-fully-interleaved load · 3d7bf662

Roman Lebedev authored Oct 14, 2021

While i've modelled most of the relevant tuples for AVX2,
that only covered fully-interleaved groups.

By definition, interleaving load of stride N means:
load N*VF elements, and shuffle them into N VF-sized vectors,
with 0'th vector containing elements `[0, VF)*stride + 0`,
and 1'th vector containing elements `[0, VF)*stride + 1`.
Example: https://godbolt.org/z/df561Me5E (i64 stride 4 vf 2 => cost 6)

Now, not fully interleaved load, is when not all of these vectors is demanded.
So at worst, we could just pretend that everything is demanded,
and discard the non-demanded vectors. What this means is that the cost
for not-fully-interleaved group should be not greater than the cost
for the same fully-interleaved group, but perhaps somewhat less.
Examples:
https://godbolt.org/z/a78dK5Geq (i64 stride 4 (indices 012u) vf 2 => cost 4)
https://godbolt.org/z/G91ceo8dM (i64 stride 4 (indices 01uu) vf 2 => cost 2)
https://godbolt.org/z/5joYob9rx (i64 stride 4 (indices 0uuu) vf 2 => cost 1)

As we have established over the course of last ~70 patches, (wow)
`BaseT::getInterleavedMemoryOpCos()` is absolutely bogus,
it is usually almost an order of magnitude overestimation,
so i would claim that we should at least use the hardcoded costs
of fully interleaved load groups.

We could go further and adjust them e.g. by the number of demanded indices,
but then i'm somewhat fearful of underestimating the cost.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111174

3d7bf662

autogen tests for ease of update · 7f3861cf
Philip Reames authored Oct 13, 2021

7f3861cf
[RISCV] Remove unused member variable. NFC · 79ae9562
Craig Topper authored Oct 14, 2021

79ae9562

[IVUsers] Move preheader check into SCEVExpander · 69853f99

Nikita Popov authored Oct 12, 2021

Rather than checking for loop nest preheaders upfront in IVUsers,
move this requirement into isSafeToExpand() from SCEVExpander.

Historically, LSR did not check whether SCEVs are safe to expand
and fully relied on IVUsers to validate this. Later, support for
non-expandable SCEVs was added via rigid formulas.

Checking this in isSafeToExpand() makes it more obvious what
exactly this check is guarding against, and avoids the awkward
loop nest scan.

This is a followup to https://reviews.llvm.org/D111493#3055286.

Differential Revision: https://reviews.llvm.org/D111681

69853f99

Fix a crash on valid consteval code. · 68157fe1

Aaron Ballman authored Oct 14, 2021

Not all constants are emitted within the context of a function, so use
the module's ASTContext instead because 1) that's the same as the
current function ASTContext, and 2) the module can never be null.

Fixes PR50787.

68157fe1

[lldb] Move ~Platform to source file · 482c53fa

Raphael Isemann authored Oct 14, 2021

The called destructors of the members require the includes that are only
in the source file.

482c53fa

[Driver][Darwin] Use T reference instead of getToolChain().getTriple(). · 8ecbcd05
Frederic Cambus authored Oct 14, 2021
```
Differential Revision: https://reviews.llvm.org/D111793
```
8ecbcd05

[X86] Use CMOVNS for abs instead of CMOVGE. · 3ff9cc01

Craig Topper authored Oct 14, 2021

CMOVGE reads SF and OF. CMOVNS only reads SF. This matches with
other recent changes to use a single flag where possible. It also
matches gcc codegen.

I believe this technically changes whether the conditioanl move happens
on INT_MIN, but for INT_MIN both registers are the same so it doesn't
matter.

Differential Revision: https://reviews.llvm.org/D111826

3ff9cc01

[Polly] Remove support for code generated by gfortran+DragonEgg. · 19db33c0
Michael Kruse authored Oct 14, 2021
```
DragonEgg is not maintained anymore, hence there is no need for this
functionality.

Fixes llvm.org/PR52173
```
19db33c0
[Polly][docs] Fix itemize list for release notes. · a5e52ce3
Michael Kruse authored Oct 14, 2021
```
Make the changes top-level items, instead of subitems of the
"Changes..." placeholder.
```
a5e52ce3

Fix a rejects-valid with consteval on overloaded operators · b9941de0

Aaron Ballman authored Oct 14, 2021

It seems that Clang 11 regressed functionality that was working in
Clang 10 regarding calling a few overloaded operators in an immediate
context. Specifically, we were not checking for immediate invocations
of array subscripting and the arrow operators, but we properly handle
the other overloaded operators.

This fixes the two problematic operators and adds some test coverage to
show they're equivalent to calling the operator directly.

This addresses PR50779.

b9941de0

[lldb] Remove logging from Platform::~Platform · e632e900

Raphael Isemann authored Oct 14, 2021

Platform instances are stored in a function-local static list. However, the
logging code involves locking a function-local static mutex. This only works on
some implementations where the Log mutex is by accident destroyed *after* the
Platform list is destroyed.

This fixes randomly failing tests due to `recursive_mutex lock failed: Invalid
argument`.

Reviewed By: kastiglione

Differential Revision: https://reviews.llvm.org/D111816

e632e900

[mlir][tosa] Fix tosa.cast UiToFp32 for tosa-to-linalg · 59dd418e

Rob Suderman authored Oct 13, 2021

Part of the arith update broke UiToFp32. Fixed the lowering and included a new
test to detect a regression.

Differential Revision: https://reviews.llvm.org/D111772

59dd418e

[lldb] Rewrite TestDiamond and document some bugs. · 78e17e23
Raphael Isemann authored Oct 13, 2021

78e17e23

[libc++][AIX] Add scripts and config for building with the libcxx CI infrastructure · 228b3b72

David Tenty authored Oct 13, 2021

This initial change adds the AIX configuration to run-buildbot, an AIX
CMake cache file, and appropriate compiler and linker flags for testing
AIX to the lit "from scratch" configuration files. Either of the 32-bit or 64-bit configurations
can be built by setting `OBJECT_MODE` in the build environment (as is
typical for AIX).

Reviewed By: ldionne, #libc, #libc_abi

Differential Revision: https://reviews.llvm.org/D111244

228b3b72

[BasicAA] Improve scalable vector handling · 5f05ff08

Nikita Popov authored Sep 26, 2021

Currently, DecomposeGEP() bails out on the whole decomposition if
it encounters a scalable GEP type anywhere. However, it is fine to
still analyze other GEPs that we look through before hitting the
scalable GEP. This does mean that the decomposed GEP base is no
longer required to be the same as the underlying object. However,
I don't believe this property is necessary for correctness anymore.

This allows us to compute slightly more precise aliasing results
for GEP chains containing scalable vectors, though my primary
interest here is simplifying the code.

Differential Revision: https://reviews.llvm.org/D110511

5f05ff08

[llvm-mca][timeline] Indicate output was stopped due to cycle limit. · 0a869ef3

Daniel Sanders authored Oct 12, 2021

It can be a bit confusing to stop with no explanation so we should indicate
when further output was prevented by the cycle limit.

Differential Revision: https://reviews.llvm.org/D111753

0a869ef3

[AIX] Ignore case when comparing output from od · b050564d

Kai Nacke authored Oct 14, 2021

POSIX does not define the exact output from od tool.
While most implementations use lower case characters in hex output,
the z/OS USS implementation uses upper case characters.
To avoid LIT failures, the FileCheck option to ignore the case must
be used when checking hex bytes.

Reviewed By: abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D111427

b050564d

[TTI][X86] Merge getInterleavedMemoryOpCostAVX2 into getInterleavedMemoryOpCost. NFC · 871f7739

Simon Pilgrim authored Oct 14, 2021

This a NFC refactor patch to merge the AVX2 interleaved cost handling back into the getInterleavedMemoryOpCost base method - while getInterleavedMemoryOpCostAVX512 uses instruction and patterns very specific to AVX512+, much of the costs analysis for AVX2 can be reused for all SSE targets.

This is the first step towards improving SSE and AVX1 costs that will reuse the relevant AVX2 costs by splitting some of the tables - for instance AVX1 has very similar costs for most vXi64/vXf64 interleave patterns and many sub-128bit vector costs are the same all the way down to SSE2 (or at least SSSE3).

Differential Revision: https://reviews.llvm.org/D111822

871f7739

[Driver][WebAssembly] Use ToolChain reference instead of getToolChain(). · f7a32143
Frederic Cambus authored Oct 14, 2021
```
Differential Revision: https://reviews.llvm.org/D111786
```
f7a32143

[Polly] Clean up Polly's getting started docs. · 5f668bba

Michael Kruse authored Oct 14, 2021

This patch removes the broken bash scipt (polly.sh) and fixes the broken setup
instructions in get_started.html. It also adds instructions for using Ninja and
links to the LLVM getting started page.

Reviewed By: Meinersbur, InnovativeInventor

Differential Revision: https://reviews.llvm.org/D111685

5f668bba

[TTI][X86] Swap... · fcbec7e6

Simon Pilgrim authored Oct 14, 2021

[TTI][X86] Swap getInterleavedMemoryOpCostAVX2/getInterleavedMemoryOpCostAVX512 implementations. NFC.

I have some upcoming refactoring for SSE/AVX1 interleaving cost support, and the diff is a lot nicer if the (unaltered) AVX512 implementation isn't stuck between getInterleavedMemoryOpCost and getInterleavedMemoryOpCostAVX2

fcbec7e6

[Transforms] eliminateDeadStores - remove unused variable. NFC. · 13185f01

Simon Pilgrim authored Oct 14, 2021

The initial MemoryAccess *Current assignment is never used, and all other uses are initialized/used within the worklist loop (and not across multiple iterations) - so move the variable internal to the loop.

Fixes scan-build unused assignment warning.

13185f01

[libTooling] Add "switch"-like Stencil combinator · b6c218d4

Yitzhak Mandelbaum authored Oct 13, 2021

Adds `selectBound`, a `Stencil` combinator that allows the user to supply multiple alternative cases, discriminated by bound node IDs.

Differential Revision: https://reviews.llvm.org/D111708

b6c218d4

[FPEnv][InstSimplify] Fold fadd X, 0 ==> X, when we know X is not -0 · 727a891e

Kevin P. Neal authored Oct 14, 2021

Currently the fadd optimizations in InstSimplify don't know how to do this
NoSignedZeros "X + 0.0 ==> X" fold when using the constrained intrinsics.
This adds the support.

This review is derived from D106362 with some improvements from D107285
and is a follow-on to D111085.

Differential Revision: https://reviews.llvm.org/D111450

727a891e

[RISCV] Update Zba, Zbb, Zbc, and Zbs version from 0.93 to 1.0. · f7ba5724

Craig Topper authored Oct 14, 2021

I've removed the Zbs W instructions that are not part of the frozen spec.

References to B as an extension name have been removed. Tests are updated or split accordingly.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D110669

f7ba5724

[sanitizer] Move out stack trace pointer from header StackDepot · 8282024a

Vitaly Buka authored Oct 11, 2021

Trace pointers accessed very rarely and don't need to
be in hot data.

Depends on D111613.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D111614

8282024a

[ValueTracking] Simplify getKnowledgeValidInContext() call (NFC) · a8e7d11a
Nikita Popov authored Oct 14, 2021
```
This accepts an ArrayRef, there's no need to create a SmallVector.
```
a8e7d11a

[llvm-profgen] Allow generating AutoFDO profile from CSSPGO binary · a316343e

Wenlei He authored Oct 13, 2021

Add `-use-dwarf-correlation` switch to allow llvm-profgen to generate AutoFDO profile for binaries built with CSSPGO (pseudo-probe).

Differential Revision: https://reviews.llvm.org/D111776

a316343e

[libc++] LWG3480: make (recursive_)directory_iterator C++20 ranges · 1fa27f2a

Joe Loser authored Oct 14, 2021

Implement LWG3480 which enables `directory_iterator` and
`recursive_directory_iterator` to be both a `borrowed_range` and a
`view`.

Reviewed By: ldionne, #libc

Differential Revision: https://reviews.llvm.org/D111644

1fa27f2a

[AMDGPU] Add more tests for build_vector · e4e48e2f
Julien Pages authored Oct 14, 2021
```
Differential Revision: https://reviews.llvm.org/D111652
```
e4e48e2f

[analyzer][solver] Handle simplification to ConcreteInt · ac3edc5a

Gabor Marton authored Sep 30, 2021

The solver's symbol simplification mechanism was not able to handle cases
when a symbol is simplified to a concrete integer. This patch adds the
capability.

E.g., in the attached lit test case, the original symbol is `c + 1` and
it has a `[0, 0]` range associated with it. Then, a new condition `c == 0`
is assumed, so a new range constraint `[0, 0]` comes in for `c` and
simplification kicks in. `c + 1` becomes `0 + 1`, but the associated
range is `[0, 0]`, so now we are able to realize the contradiction.

Differential Revision: https://reviews.llvm.org/D110913

ac3edc5a