Commits · 9545976ff160e19805a84a06a7e59d446f9994d9 · Lorenzo Albano / LLVM bpEVL

Feb 08, 2022

Revert "[Clang] Propagate guaranteed alignment for malloc and others" · 9545976f

James Y Knight authored Feb 07, 2022

The above change assumed that malloc (and friends) would always
allocate memory to getNewAlign(), even for allocations which have a
smaller size. This is not actually required by spec (a 1-byte
allocation may validly have 1-byte alignment).

Some real-world malloc implementations do not provide this guarantee,
and thus this optimization is breaking programs.

Fixes #53540

This reverts commit c2297544.

Differential Revision: https://reviews.llvm.org/D118804

9545976f

[gn build] Port 4a6553f4 · 34d557f3
LLVM GN Syncbot authored Feb 08, 2022

34d557f3

[Debuginfod] [Symbolizer] Break debuginfod out of libLLVM. · 4a6553f4

Daniel Thornburgh authored Jan 25, 2022

Debuginfod can pull in libcurl as a dependency, which isn't appropriate
for libLLVM. (See
https://gitlab.freedesktop.org/mesa/mesa/-/issues/5732).

This change breaks out debuginfod into a separate non-component library
that can be used directly in llvm-symbolizer. The tool can inject
debuginfod into the Symbolizer library via an abstract DebugInfoFetcher
interface, breaking the dependency of Symbolizer on debuinfod.

See https://github.com/llvm/llvm-project/issues/52731

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D118413

4a6553f4

[clang][ARM] Re-word PACBTI warning. · 424e850f

Amilendra Kodithuwakku authored Feb 08, 2022

The original warning added in D115501 when pacbti is used with an
incompatible architecture was not exactly correct because it was
not really ignored and can affect codegen.

Therefore reword to say that the pacbti option is incompatible with
the given architecture.

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D119166

424e850f

[clang][Fuchsia] Ensure static sanitizer libs are only linked in after the -nostdlib check · 4ac58b61
Leonard Chan authored Feb 08, 2022
```
Differential Revision: https://reviews.llvm.org/D119201
```
4ac58b61

Allow parameter pack expansions and initializer lists in annotate attribute · ead1690d

Steffen Larsen authored Feb 08, 2022

These changes make the Clang parser recognize expression parameter pack
expansion and initializer lists in attribute arguments. Because
expression parameter pack expansion requires additional handling while
creating and instantiating templates, the support for them must be
explicitly supported through the AcceptsExprPack flag.

Handling expression pack expansions may require a delay to when the
arguments of an attribute are correctly populated. To this end,
attributes that are set to accept these - through setting the
AcceptsExprPack flag - will automatically have an additional variadic
expression argument member named DelayedArgs. This member is not
exposed the same way other arguments are but is set through the new
CreateWithDelayedArgs creator function generated for applicable
attributes.

To illustrate how to implement support for expression pack expansion
support, clang::annotate is made to support pack expansions. This is
done by making handleAnnotationAttr delay setting the actual attribute
arguments until after template instantiation if it was unable to
populate the arguments due to dependencies in the parsed expressions.

ead1690d

[libc][NFC] Remove all Linux specific code to respective linux/ directories · 70ae480c

Alex Brachet authored Feb 08, 2022

These were all the non OS agnostic implementations I could find in general directories.

Currently none of these functions are actually enabled, but for when they do it makes sense that they be in linux/ specific directories.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D119164

70ae480c

[SimplifyCFG] 'merge compatible invokes': fully support indirect invokes · c8ba2b67

Roman Lebedev authored Feb 08, 2022

As long as *all* the invokes in the set are indirect,
we can merge them, but don't merge direct invokes into the set,
even though it would be legal to do.

c8ba2b67

[SimplifyCFG] 'merge compatible invokes': don't create trivial PHI's with... · 414b4764
Roman Lebedev authored Feb 08, 2022
```
[SimplifyCFG] 'merge compatible invokes': don't create trivial PHI's with all-identical incoming values
```
414b4764
[NFC][SimplifyCFG] 'merge compatible invokes': tests for indirect invokes. · e2aed0b0
Roman Lebedev authored Feb 08, 2022

e2aed0b0

[AMDGPU] Select VGPR versions of MFMA if possible · aeaf85b9

Stanislav Mekhanoshin authored Jan 12, 2022

We can select _vgprcd versions of MAI instructions and have no
AGPRs with the whole budget left for VGPRs if:

1. This is a kernel;
2. It has no calls;
3. It runs at least on 2 waves thus having not more that 256 VGPRs.
4. There is no inline asm requesting AGPRs.

Differential Revision: https://reviews.llvm.org/D117253

aeaf85b9

[mlir][Linalg] NFC: Combine elementwise fusion test passes. · 2abd7f13

Mahesh Ravishankar authored Feb 07, 2022

There are a few different test passes that check elementwise fusion in
Linalg. Consolidate them to a single pass controlled by different pass
options (in keeping with how `TestLinalgTransforms` exists).

2abd7f13

[flang] Upstream partial lowering of GET_ENVIRONMENT_VARIABLE intrinsic · 5ebbcfa0

Josh Mottley authored Feb 03, 2022

This patch adds partial lowering of the "GET_ENVIRONMENT_VARIABLE" intrinsic
to the backend runtime hook implemented in patches D111394 and D112698.
It also renames the `isPresent` lambda to `isAbsent` and moves it out to
its own function in `Command.cpp`. Corresponding comment fixes for this
are also modified. Lastly it adds the i1 type to
`RuntimeCallTestBash.h`.

Differential Revision: https://reviews.llvm.org/D118984

5ebbcfa0

[X86] Update register RCL/RCR by 1 and immediate scheduling for Intel CPUs · 56d6ccd4

Craig Topper authored Feb 08, 2022

Most Intel CPU scheduler files lumped the immediate and 1 instructions
together, but uops.info shows they are quite different.

For the most part the by 1 instructions were pretty accurate to the uops.info
data except the latency was 3 instead of 2 as uops.info indicates.

The by immediate instructions need 7 or 8 uops and have higher latency.

It looks like the 8-bit by immediate instructions may need even more
uops, but I just lumped them with the 16/32/64.

Noticed while checking out PR53648. So mostly I cared about the by 1
instructions.

Reviewed By: RKSimon, pengfei

Differential Revision: https://reviews.llvm.org/D119217

56d6ccd4

[C++2b] Implement multidimentional subscript operator · c1512250

Corentin Jabot authored Feb 08, 2022

Implement P2128R6 in C++23 mode.

Unlike GCC's implementation, this doesn't try to recover when a user
meant to use a comma expression.

Because the syntax changes meaning in C++23, the patch is *NOT*
implemented as an extension. Instead, declaring an array with not
exactly 1 parameter is an error in older languages modes. There is an
off-by-default extension warning in C++23 mode.

Unlike the standard, we supports default arguments;

Ie, we assume, based on conversations in WG21, that the proposed
resolution to CWG2507 will be accepted.

We allow arrays OpenMP sections and C++23 multidimensional array to
coexist:

[a , b] multi dimensional array
[a : b] open mp section
[a, b: c] // error

The rest of the patch is relatively straight forward: we take care to
support an arbitrary number of arguments everywhere.

c1512250

[Attributor] Emit fixed-point remark on function list · caf7f05c

Joseph Huber authored Feb 08, 2022

This patch replaces the function we emit the remark on when we run into
the fix-point limit. Previously we got a function to emit a remark on
from the worklist's associated function. However, the worklist may not
always have an associated function in the case of global variables.
Replace this with the function set, and if there are no functions don't
emit the remark.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119248

caf7f05c

[Libomptarget] Add header files as a dependency to CMake target · 99d72ebd

Joseph Huber authored Feb 08, 2022

This patch manually adds the runtime include files to the list of
dependencies when we build the bitcode runtime library. Previously if
only the header was changed we would not recompile the source files.
The solution used here isn't optimal because every source file not has a
dependency on each header file regardless of if it was actually used by
that file.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D119254

99d72ebd

[Hexagon] Alter meaning of versionless -mhvx · 2ecda9ec

Krzysztof Parzyszek authored Feb 08, 2022

The documentation for the official (downstream) Qualcomm Hexagon Clang
states that -mhvx sets the HVX version to be the same as the CPU version.
The current implementation upstream would use the most recent versioned
-mhvx= flag first (if present), then the CPU version. Change the upstream
behavior to match the documented behavior of the downstream compiler.

2ecda9ec

[NFC] Increase initial size of FoldingSets used in ASTContext and CodeGenTypes · 5d8d3a11

Dawid Jurczak authored Feb 08, 2022

Among many FoldingSet users most notable seem to be ASTContext and CodeGenTypes.
The reasons that we spend not-so-tiny amount of time in FoldingSet calls from there, are following:

  1. Default FoldingSet capacity for 2^6 items very often is not enough.
     For PointerTypes/ElaboratedTypes/ParenTypes it's not unlikely to observe growing it to 256 or 512 items.
     FunctionProtoTypes can easily exceed 1k items capacity growing up to 4k or even 8k size.

  2. FoldingSetBase::GrowBucketCount cost itself is not very bad (pure reallocations are rather cheap thanks to BumpPtrAllocator).
     What matters is high collision rate when lot of items end up in same bucket slowing down FoldingSetBase::FindNodeOrInsertPos and trashing CPU cache
     (as items with same hash are organized in intrusive linked list which need to be traversed).

This change address both issues by increasing initial size of FoldingSets used in ASTContext and CodeGenTypes.

Extracted from: https://reviews.llvm.org/D118385

Differential Revision: https://reviews.llvm.org/D118608

5d8d3a11

[MLIR][Presburger] Fix linkage of functions in header · 78eeda75
Benjamin Kramer authored Feb 08, 2022
```
Static functions in a header cause spurious unused function warnings.
```
78eeda75
[mlir][bazel] Update post 24a1 · d15baefa
Jacques Pienaar authored Feb 08, 2022

d15baefa

[mlir][taco] Use sparse_tensor.out to write sparse tensors to files. · 61a3dd70

Bixia Zheng authored Feb 04, 2022

Add a Python method, output_sparse_tensor, to use sparse_tensor.out to write
a sparse tensor value to a file.

Modify the method that evaluates a tensor expression to return a pointer of the
MLIR sparse tensor for the result to delay the extraction of the coordinates and
non-zero values.

Implement the Tensor to_file method to evaluate the tensor assignment and write
the result to a file.

Add unit tests. Modify test golden files to reflect the change that TNS outputs
now have a comment line and two meta data lines.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D118956

61a3dd70

Revert "[analyzer] Prevent misuses of -analyze-function" · 620d99b7

Balazs Benics authored Feb 08, 2022

This reverts commit 841817b1.

Ah, it still fails on build bots for some reason.
Pinning the target triple was not enough.

620d99b7

[libc++][nfc] Use TEST_SAFE_STATIC. · 5dc1da3e

Mark de Wever authored Feb 02, 2022

This avoids using an libc++ internal macro in our tests.

Reviewed By: #libc, philnik, ldionne

Differential Revision: https://reviews.llvm.org/D118874

5dc1da3e

[libc++] Removes cpp17_output_iterator's default constructor. · a0071b93

Mark de Wever authored Feb 04, 2022

This has been suggested in D117950.

Reviewed By: ldionne, #libc, philnik

Differential Revision: https://reviews.llvm.org/D118971

a0071b93

[Support] Don't print stacktrace if DbgHelp.dll hasn't been loaded yet · 3df88ec3

Andy Yankovsky authored Feb 07, 2022

On Windows certain function from `Signals.h` require that `DbgHelp.dll` is loaded. This typically happens when the main program calls `llvm::InitLLVM`, however in some cases main program doesn't do that (e.g. when the application is using LLDB via `liblldb.dll`). This patch adds a safe guard to prevent crashes. More discussion in
https://reviews.llvm.org/D119009.

Reviewed By: aganea

Differential Revision: https://reviews.llvm.org/D119181

3df88ec3

[nfc][mlgo][regalloc] Stop warnings about unused function · 5a50ab4d

Mircea Trofin authored Feb 08, 2022

Added a `NoopSavedModelImpl` type which can be used as a mock AOT-ed
saved model, and further minimize conditional compilation cases. This
also removes unused function warnings on gcc.

5a50ab4d

[MLIR][GPU] Update GPUToROCDL to account for ControlFlow dialect · 24a1869d

Krzysztof Drewniak authored Feb 07, 2022

The conversion to the new ControlFlow dialect didn't change the
GPUToROCDL pass - this commit fixes this issue.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D119188

24a1869d

[llvm-profgen] On-demand track optimized-away inlinees for preinliner. · 34e131b0

Hongtao Yu authored Jan 28, 2022

Tracking optimized-away inlinees based on all probes in a binary is expansive in terms of memory usage I'm making the tracking on-demand based on profiled functions only. This saves about 10%  memory overall for a medium-sized benchmark.

Before:

   note: After parsePerfTraces
   note: Thu Jan 27 18:42:09 2022
   note: VM: 8.68 GB   RSS: 8.39 GB
   note: After computeSizeForProfiledFunctions
   note: Thu Jan 27 18:42:41 2022
   note: **VM: 10.63 GB   RSS: 10.20 GB**
   note: After generateProbeBasedProfile
   note: Thu Jan 27 18:45:49 2022
   note: VM: 25.00 GB   RSS: 24.95 GB
   note: After postProcessProfiles
   note: Thu Jan 27 18:49:29 2022
   note: VM: 26.34 GB   RSS: 26.27 GB

After:
   note: After parsePerfTraces
   note: Fri Jan 28 12:04:49 2022
   note: VM: 8.68 GB   RSS: 7.65 GB
   note: After computeSizeForProfiledFunctions
   note: Fri Jan 28 12:05:26 2022
   note: **VM: 8.68 GB   RSS: 8.42 GB**
   note: After generateProbeBasedProfile
   note: Fri Jan 28 12:08:03 2022
   note: VM: 22.93 GB   RSS: 22.89 GB
   note: After postProcessProfiles
   note: Fri Jan 28 12:11:30 2022
   note: VM: 24.27 GB   RSS: 24.22 GB

This should be a no-diff change in terms of profile quality.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D118515

34e131b0

[libc++][format[[nfc] Use string_view in tests. · e885b113

Mark de Wever authored Jan 29, 2022

This change is a preparation for adapting the tests for
  P2216 std::format improvements

Reviewed By: #libc, Quuxplusone, ldionne

Differential Revision: https://reviews.llvm.org/D118717

e885b113

[analyzer] Prevent misuses of -analyze-function · 841817b1

Balazs Benics authored Feb 08, 2022

Sometimes when I pass the mentioned option I forget about passing the
parameter list for c++ sources.
It would be also useful newcomers to learn about this.

This patch introduces some logic checking common misuses involving
`-analyze-function`.

Reviewed-By: martong

Differential Revision: https://reviews.llvm.org/D118690

841817b1

AMDGPU: Use reserved VGPR for AGPR spills to memory · f2c99ea4

Matt Arsenault authored Feb 04, 2022

Previously would reuse the VGPR used for large frame offsets with the
one needed for copying from the AGPR. Fix this by reusing the register
we already reserved for handling AGPR to AGPR copies.

f2c99ea4

[SCEV] Generalize SCEVEqualsPredicate to any compare [NFC] · c302f1e6

Philip Reames authored Feb 08, 2022

PredicatedScalarEvolution has a predicate type for representing A == B.  This change generalizes it into something which can represent a A <pred> B.

This generality is currently unused, but is motivated by a couple of recent cases which have come up.  In particular, I'm currently playing around with using this to simplify the runtime checking code in LoopVectorizer. Regardless of the outcome of that prototyping, generalizing the compare node seemed useful.

c302f1e6

[Mem2Reg] Check that load type matches alloca type · 074561a4

Nikita Popov authored Feb 08, 2022

Alloca promotion can only deal with cases where the load/store
types match the alloca type (it explicitly does not support
bitcasted load/stores).

With opaque pointers this is no longer enforced through the pointer
type, so add an explicit check.

074561a4

AMDGPU: Reserve v32 if we may need to copy between AGPRs on gfx908 · 8b2ca766

Matt Arsenault authored Dec 14, 2021

We need to guarantee cheap copies between AGPRs, and unfortunately
gfx908 cannot directly do this. Theoretically we could set the
scavenger up with an emergency spill slot, but it also feels
unreasonable to pay that cost for what was assumed to be a simple and
cheap copy. Pick a register that doesn't conflict with any ABI
registers.

This does not address the same issue when copying from SGPR to AGPR
for gfx90a (this coincidentally fixes it for gfx908), but that's less
interesting since the register allocator shouldn't be proactively
introducing such copies.

One edge case I'm worried about is respecting the VGPR budget implied
by amdgpu-waves-per-eu. If the theoretical upper bound of a function
is 32 VGPRs, this will force the actual count to be 33.

This is also broken if inline assembly uses/defs something in v32. The
coalescer will eliminate the intermediate vreg between the def and
use, and the introduced copy will clobber the user value.

(cherry picked from commit 3335784ac2d587ff4eac04586e189532ae8b2607)

8b2ca766

AMDGPU: Regenerate mir test checks to include -NEXT · a7f60bfd
Matt Arsenault authored Feb 04, 2022

a7f60bfd

[libc++] Add a Lit configuration for running back-deployment tests · 768b50df

Louis Dionne authored Feb 07, 2022

This testing configuration links tests against one libc++ shared library,
but runs them against another libc++ shared library. This makes sure that
we can build applications against the libc++ provided in a recent SDK and
back-deploy them to platforms containing older libc++ dylibs.

It also switches the Apple CI script to using that new configuration
instead of the legacy one.

Differential Revision: https://reviews.llvm.org/D119195

768b50df

[NFC] Refactor llvm-nm symbol comparing and split sorting · d11915b5

zhijian authored Feb 08, 2022

Summary:
1.added a helper function isSymbolDefined().
2.Split out sorting code
3.refactor symbol comparing function

Reviewers: James Henderson,Fangrui Song
Differential Revision: https://reviews.llvm.org/D119028

d11915b5

[SDAG] enable binop identity constant folds for fmul/fdiv · 905abc5b

Sanjay Patel authored Feb 08, 2022

The test diffs are identical to D119111.

This only affects x86 currently because no other target
has an override for the TLI hook that controls this transform.

905abc5b

[AutoUpgrade] Handle remangling upgrade for ptr.annotation · 48eeefe5

Nikita Popov authored Feb 08, 2022

The code assumed that the upgrade would happen due to the argument
count changing from 4 to 5. However, a remangling upgrade is also
possible here.

48eeefe5