Commits · 5d8d3a11c4d4ed8bb610f60f8fa37b8043a40acd · Lorenzo Albano / LLVM bpEVL

Feb 08, 2022

[NFC] Increase initial size of FoldingSets used in ASTContext and CodeGenTypes · 5d8d3a11

Dawid Jurczak authored Feb 08, 2022

Among many FoldingSet users most notable seem to be ASTContext and CodeGenTypes.
The reasons that we spend not-so-tiny amount of time in FoldingSet calls from there, are following:

  1. Default FoldingSet capacity for 2^6 items very often is not enough.
     For PointerTypes/ElaboratedTypes/ParenTypes it's not unlikely to observe growing it to 256 or 512 items.
     FunctionProtoTypes can easily exceed 1k items capacity growing up to 4k or even 8k size.

  2. FoldingSetBase::GrowBucketCount cost itself is not very bad (pure reallocations are rather cheap thanks to BumpPtrAllocator).
     What matters is high collision rate when lot of items end up in same bucket slowing down FoldingSetBase::FindNodeOrInsertPos and trashing CPU cache
     (as items with same hash are organized in intrusive linked list which need to be traversed).

This change address both issues by increasing initial size of FoldingSets used in ASTContext and CodeGenTypes.

Extracted from: https://reviews.llvm.org/D118385

Differential Revision: https://reviews.llvm.org/D118608

5d8d3a11

[MLIR][Presburger] Fix linkage of functions in header · 78eeda75
Benjamin Kramer authored Feb 08, 2022
```
Static functions in a header cause spurious unused function warnings.
```
78eeda75
[mlir][bazel] Update post 24a1 · d15baefa
Jacques Pienaar authored Feb 08, 2022

d15baefa

[mlir][taco] Use sparse_tensor.out to write sparse tensors to files. · 61a3dd70

Bixia Zheng authored Feb 04, 2022

Add a Python method, output_sparse_tensor, to use sparse_tensor.out to write
a sparse tensor value to a file.

Modify the method that evaluates a tensor expression to return a pointer of the
MLIR sparse tensor for the result to delay the extraction of the coordinates and
non-zero values.

Implement the Tensor to_file method to evaluate the tensor assignment and write
the result to a file.

Add unit tests. Modify test golden files to reflect the change that TNS outputs
now have a comment line and two meta data lines.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D118956

61a3dd70

Revert "[analyzer] Prevent misuses of -analyze-function" · 620d99b7

Balazs Benics authored Feb 08, 2022

This reverts commit 841817b1.

Ah, it still fails on build bots for some reason.
Pinning the target triple was not enough.

620d99b7

[libc++][nfc] Use TEST_SAFE_STATIC. · 5dc1da3e

Mark de Wever authored Feb 02, 2022

This avoids using an libc++ internal macro in our tests.

Reviewed By: #libc, philnik, ldionne

Differential Revision: https://reviews.llvm.org/D118874

5dc1da3e

[libc++] Removes cpp17_output_iterator's default constructor. · a0071b93

Mark de Wever authored Feb 04, 2022

This has been suggested in D117950.

Reviewed By: ldionne, #libc, philnik

Differential Revision: https://reviews.llvm.org/D118971

a0071b93

[Support] Don't print stacktrace if DbgHelp.dll hasn't been loaded yet · 3df88ec3

Andy Yankovsky authored Feb 07, 2022

On Windows certain function from `Signals.h` require that `DbgHelp.dll` is loaded. This typically happens when the main program calls `llvm::InitLLVM`, however in some cases main program doesn't do that (e.g. when the application is using LLDB via `liblldb.dll`). This patch adds a safe guard to prevent crashes. More discussion in
https://reviews.llvm.org/D119009.

Reviewed By: aganea

Differential Revision: https://reviews.llvm.org/D119181

3df88ec3

[nfc][mlgo][regalloc] Stop warnings about unused function · 5a50ab4d

Mircea Trofin authored Feb 08, 2022

Added a `NoopSavedModelImpl` type which can be used as a mock AOT-ed
saved model, and further minimize conditional compilation cases. This
also removes unused function warnings on gcc.

5a50ab4d

[MLIR][GPU] Update GPUToROCDL to account for ControlFlow dialect · 24a1869d

Krzysztof Drewniak authored Feb 07, 2022

The conversion to the new ControlFlow dialect didn't change the
GPUToROCDL pass - this commit fixes this issue.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D119188

24a1869d

[llvm-profgen] On-demand track optimized-away inlinees for preinliner. · 34e131b0

Hongtao Yu authored Jan 28, 2022

Tracking optimized-away inlinees based on all probes in a binary is expansive in terms of memory usage I'm making the tracking on-demand based on profiled functions only. This saves about 10%  memory overall for a medium-sized benchmark.

Before:

   note: After parsePerfTraces
   note: Thu Jan 27 18:42:09 2022
   note: VM: 8.68 GB   RSS: 8.39 GB
   note: After computeSizeForProfiledFunctions
   note: Thu Jan 27 18:42:41 2022
   note: **VM: 10.63 GB   RSS: 10.20 GB**
   note: After generateProbeBasedProfile
   note: Thu Jan 27 18:45:49 2022
   note: VM: 25.00 GB   RSS: 24.95 GB
   note: After postProcessProfiles
   note: Thu Jan 27 18:49:29 2022
   note: VM: 26.34 GB   RSS: 26.27 GB

After:
   note: After parsePerfTraces
   note: Fri Jan 28 12:04:49 2022
   note: VM: 8.68 GB   RSS: 7.65 GB
   note: After computeSizeForProfiledFunctions
   note: Fri Jan 28 12:05:26 2022
   note: **VM: 8.68 GB   RSS: 8.42 GB**
   note: After generateProbeBasedProfile
   note: Fri Jan 28 12:08:03 2022
   note: VM: 22.93 GB   RSS: 22.89 GB
   note: After postProcessProfiles
   note: Fri Jan 28 12:11:30 2022
   note: VM: 24.27 GB   RSS: 24.22 GB

This should be a no-diff change in terms of profile quality.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D118515

34e131b0

[libc++][format[[nfc] Use string_view in tests. · e885b113

Mark de Wever authored Jan 29, 2022

This change is a preparation for adapting the tests for
  P2216 std::format improvements

Reviewed By: #libc, Quuxplusone, ldionne

Differential Revision: https://reviews.llvm.org/D118717

e885b113

[analyzer] Prevent misuses of -analyze-function · 841817b1

Balazs Benics authored Feb 08, 2022

Sometimes when I pass the mentioned option I forget about passing the
parameter list for c++ sources.
It would be also useful newcomers to learn about this.

This patch introduces some logic checking common misuses involving
`-analyze-function`.

Reviewed-By: martong

Differential Revision: https://reviews.llvm.org/D118690

841817b1

AMDGPU: Use reserved VGPR for AGPR spills to memory · f2c99ea4

Matt Arsenault authored Feb 04, 2022

Previously would reuse the VGPR used for large frame offsets with the
one needed for copying from the AGPR. Fix this by reusing the register
we already reserved for handling AGPR to AGPR copies.

f2c99ea4

[SCEV] Generalize SCEVEqualsPredicate to any compare [NFC] · c302f1e6

Philip Reames authored Feb 08, 2022

PredicatedScalarEvolution has a predicate type for representing A == B.  This change generalizes it into something which can represent a A <pred> B.

This generality is currently unused, but is motivated by a couple of recent cases which have come up.  In particular, I'm currently playing around with using this to simplify the runtime checking code in LoopVectorizer. Regardless of the outcome of that prototyping, generalizing the compare node seemed useful.

c302f1e6

[Mem2Reg] Check that load type matches alloca type · 074561a4

Nikita Popov authored Feb 08, 2022

Alloca promotion can only deal with cases where the load/store
types match the alloca type (it explicitly does not support
bitcasted load/stores).

With opaque pointers this is no longer enforced through the pointer
type, so add an explicit check.

074561a4

AMDGPU: Reserve v32 if we may need to copy between AGPRs on gfx908 · 8b2ca766

Matt Arsenault authored Dec 14, 2021

We need to guarantee cheap copies between AGPRs, and unfortunately
gfx908 cannot directly do this. Theoretically we could set the
scavenger up with an emergency spill slot, but it also feels
unreasonable to pay that cost for what was assumed to be a simple and
cheap copy. Pick a register that doesn't conflict with any ABI
registers.

This does not address the same issue when copying from SGPR to AGPR
for gfx90a (this coincidentally fixes it for gfx908), but that's less
interesting since the register allocator shouldn't be proactively
introducing such copies.

One edge case I'm worried about is respecting the VGPR budget implied
by amdgpu-waves-per-eu. If the theoretical upper bound of a function
is 32 VGPRs, this will force the actual count to be 33.

This is also broken if inline assembly uses/defs something in v32. The
coalescer will eliminate the intermediate vreg between the def and
use, and the introduced copy will clobber the user value.

(cherry picked from commit 3335784ac2d587ff4eac04586e189532ae8b2607)

8b2ca766

AMDGPU: Regenerate mir test checks to include -NEXT · a7f60bfd
Matt Arsenault authored Feb 04, 2022

a7f60bfd

[libc++] Add a Lit configuration for running back-deployment tests · 768b50df

Louis Dionne authored Feb 07, 2022

This testing configuration links tests against one libc++ shared library,
but runs them against another libc++ shared library. This makes sure that
we can build applications against the libc++ provided in a recent SDK and
back-deploy them to platforms containing older libc++ dylibs.

It also switches the Apple CI script to using that new configuration
instead of the legacy one.

Differential Revision: https://reviews.llvm.org/D119195

768b50df

[NFC] Refactor llvm-nm symbol comparing and split sorting · d11915b5

zhijian authored Feb 08, 2022

Summary:
1.added a helper function isSymbolDefined().
2.Split out sorting code
3.refactor symbol comparing function

Reviewers: James Henderson,Fangrui Song
Differential Revision: https://reviews.llvm.org/D119028

d11915b5

[SDAG] enable binop identity constant folds for fmul/fdiv · 905abc5b

Sanjay Patel authored Feb 08, 2022

The test diffs are identical to D119111.

This only affects x86 currently because no other target
has an override for the TLI hook that controls this transform.

905abc5b

[AutoUpgrade] Handle remangling upgrade for ptr.annotation · 48eeefe5

Nikita Popov authored Feb 08, 2022

The code assumed that the upgrade would happen due to the argument
count changing from 4 to 5. However, a remangling upgrade is also
possible here.

48eeefe5

[AArch64][CodeGen] Always use SVE (when enabled) to lower 64-bit vector multiplies · eabae1b0

David Sherwood authored Feb 02, 2022

This patch adds custom lowering support for ISD::MUL with v1i64 and v2i64
types when SVE is enabled, regardless of the minimum SVE vector length. We
do this because NEON simply does not have 64-bit vector multiplies, so we
want to take advantage of these instructions in SVE.

I've updated the 128-bit min SVE vector bits tests here:

  CodeGen/AArch64/sve-fixed-length-int-arith.ll
  CodeGen/AArch64/sve-fixed-length-int-mulh.ll
  CodeGen/AArch64/sve-fixed-length-int-rem.ll

Differential Revision: https://reviews.llvm.org/D118802

eabae1b0

[MLIR][Presburger] Support computing volumes via hyperrectangular overapproximation · 1096fcff

Arjun P authored Feb 08, 2022

Add support for computing an overapproximation of the number of integer points
in a polyhedron. The returned result is actually the number of integer points
one gets by computing the "rational shadow" obtained by projecting out the
local IDs, finding the minimal axis-parallel hyperrectangular approximation
of the shadow, and returning the number of integer points in that. This does
not currently support symbols.

Reviewed By: Groverkss

Differential Revision: https://reviews.llvm.org/D119228

1096fcff

[ValueTracking] Only check for non-undef/poison if already known to be a self-multiply · ae9414d5

Roman Lebedev authored Feb 08, 2022

https://godbolt.org/z/js9fTTG9h
^ we don't care what `isGuaranteedNotToBeUndefOrPoison()` says
unless we already knew that the operands were equal.

ae9414d5

[NFC][clang] Autogenerate checklines in CodeGenCXX/nrvo.cpp · eaac0e87

Roman Lebedev authored Feb 08, 2022

It checks IR after optimizations, which is inherently fragile,
and the results are now different after the recent patch.

eaac0e87

[MLIR][Presburger] Simplex::computeIntegerBounds: support unbounded directions... · 738c738b
Arjun P authored Feb 08, 2022
```
[MLIR][Presburger] Simplex::computeIntegerBounds: support unbounded directions by returning Optionals
```
738c738b

[demangler][NFC] Utility header cleanups · f0ef708d

Nathan Sidwell authored Feb 07, 2022

a) Using a do...while loop in the number formatter means we do not
have to special case zero.

b) Let's use 'if (auto size = ...) {}' for appending to the output
buffer.

c) We should also be using memcpy there, not memmove -- the string
being appended is never part of the current buffer.

d) Let's put all the operator<< functions together.

e) I find 'if (cond) frob(..., true) ; elseOD frob(..., false)'
somewhat confusing.  Let's just use std::abs in the signed integer
printer and let CSE decide about the duplicate < 0 testing.

f) Let's have as many as possible return *this.  That's both more
consistent, and allows tailcalls in some cases (the actual number
formatter has a local array though).

These changes removed around 100 bytes from the demangler's
instructions on x86_64.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D119176

f0ef708d

[OpenCL] Mark kernel arguments as ABI aligned · 18834dca

Nikita Popov authored Feb 03, 2022

Following the discussion on D118229, this marks all pointer-typed
kernel arguments as having ABI alignment, per section 6.3.5 of
the OpenCL spec:

> For arguments to a __kernel function declared to be a pointer to
> a data type, the OpenCL compiler can assume that the pointee is
> always appropriately aligned as required by the data type.

Differential Revision: https://reviews.llvm.org/D118894

18834dca

[AMDGPURewriteOutArguments] Don't use pointer element type · 99702734

Nikita Popov authored Feb 08, 2022

Instead of using the pointer element type, look at how the pointer
is actually being used in store instructions, while looking through
bitcasts. This makes the transform compatible with opaque pointers
and a bit more general.

It's worth noting that I have dropped the 3-vector to 4-vector
shufflevector special case, because this is now handled in a
different way: If the value is actually used as a 4-vector, then
we're directly going to use that type, instead of shuffling to a
3-vector in between.

Differential Revision: https://reviews.llvm.org/D119237

99702734

[X86] selectLEAAddr - relax heuristic to only require one operand to be a... · 0b00cd19

Simon Pilgrim authored Feb 08, 2022

[X86] selectLEAAddr - relax heuristic to only require one operand to be a MathWithFlags op (PR46809)

As suggested by @craig.topper, relaxing LEA matching to only require the ADD to be fed from a single op with EFLAGS helps avoid duplication when the EFLAGS are consumed in a later, dependent instruction.

There was some concern about whether the heuristic is too simple, not taking into account lost loads that can't fold by using a LEA, but some basic tests (included in select-lea.ll) don't suggest that's really a problem.

Differential Revision: https://reviews.llvm.org/D118128

0b00cd19

Cleanup LLVMDebugInfoCodeView headers · 81cde474

serge-sans-paille authored Feb 04, 2022

Major user-facing changes:

Many headers in llvm/DebugInfo/CodeView no longer include
llvm/Support/BinaryStreamReader.h or llvm/Support/BinaryStreamWriter.h,
those headers may need to be included manually.

Several headers in llvm/DebugInfo/CodeView no longer include
llvm/DebugInfo/CodeView/EnumTables.h or llvm/DebugInfo/CodeView/CodeView.h,
those headers may need to be included manually.

Some statistics:
$ clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/DebugInfo/CodeView/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
after:  2794466
before: 2832765

Discourse thread on the topic: https://discourse.llvm.org/t/include-what-you-use-include-cleanup/

Differential Revision: https://reviews.llvm.org/D119092

81cde474

[X86] Remove __builtin_ia32_padd/psub saturated intrinsics and use generic... · 09857a4b

Simon Pilgrim authored Feb 08, 2022

[X86] Remove __builtin_ia32_padd/psub saturated intrinsics and use generic __builtin_elementwise_add/sub_sat

D117898 added the generic __builtin_elementwise_add_sat and __builtin_elementwise_sub_sat with the same integer behaviour as the SSE/AVX instructions

This patch removes the __builtin_ia32_padd/psub saturated intrinsics and just uses the generics - the existing tests see no changes:

__m256i test_mm256_adds_epi8(__m256i a, __m256i b) {
  // CHECK-LABEL: test_mm256_adds_epi8
  // CHECK: call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> %{{.*}}, <32 x i8> %{{.*}})
  return _mm256_adds_epi8(a, b);
}

09857a4b

[AutoUpgrade] Also upgrade intrinsics in invokes · 8398e61f

Nikita Popov authored Feb 08, 2022

We currently don't have any specialized upgrades for intrinsics
that can be used in invokes, but they can still be subject to
a generic remangling upgrade. In particular, this happens when
upgrading statepoint intrinsics under -opaque-pointers.

This patch just changes the upgrade code to work on CallBase
instead of CallInst in particular.

8398e61f

[OpenMP] Enable new driver tests for AMDGPU · f8ffac59

Joseph Huber authored Feb 08, 2022

This patch enables running the new driver tests for AMDGPU. Previously
this was disabled because some tests failed. This was only because the
new driver tests hadn't been listed as unsupported or expected to fail.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D119240

f8ffac59

[SDAG] move x86 select-with-identity-constant fold behind a target hook; NFC · a68e0980

Sanjay Patel authored Feb 08, 2022

This is no-functional-change-intended because only the
x86 target enables the TLI hook currently.

We can add fmul/fdiv opcodes to the switch similar to the
proposal D119111, but we don't need to make other changes
like enabling target-specific combines.

We can also add integer opcodes (add, or, shl, etc.) to
the switch because this function is called from all of the
generic binary opcodes.

The goal is to incrementally enable the profitable diffs
from D90113 while avoiding regressions.

Differential Revision: https://reviews.llvm.org/D119150

a68e0980

[SimplifyCFG] 'merge compatible invokes': support normal destination w/ uses · 42ca7cc8

Roman Lebedev authored Feb 08, 2022

If the original invokes had uses, the uses must have been in PHI's,
but that immediately results in the incoming values being incompatible.
But we'll replace uses of the original invokes with the use of the
merged invoke, so as long as the incoming values become compatible
after that, we can merge.

42ca7cc8

[SimplifyCFG] 'merge compatible invokes': support normal destination w/ PHIs but no uses · 9986d602
Roman Lebedev authored Feb 08, 2022
```
As long as the incoming values for all the invokes in the set
are identical, we can merge the invokes.
```
9986d602

[SimplifyCFG] 'merge compatible invokes': support normal destination w/ no uses, no PHI's · 8411560f

Roman Lebedev authored Feb 08, 2022

Even if the invokes have normal destination, iff it's the same block,
we can merge them. For now, require that there are no PHI nodes,
and the returned values of invokes aren't used.

8411560f

[NFC][SimplifyCFG] 'merge compatible invokes': more tests for various edge-cases · 1d5a3f70
Roman Lebedev authored Feb 08, 2022

1d5a3f70