Commits · f9c8febc522c2d26a44d4881f015e0e11e4f9167 · Lorenzo Albano / LLVM bpEVL

Jul 20, 2020

[mlir] Added support for symbols inside linalg.generic and map concatenation · f9c8febc

Jakub Lichman authored Jul 20, 2020

This commit adds functionality needed for implementation of convolutions with
linalg.generic op. Since linalg.generic right now expects indexing maps to be
just permutations, offset indexing needed in convolutions is not possible.
Therefore in this commit we address the issue by adding support for symbols inside
indexing maps which enables more advanced indexing. The upcoming commit will
solve the problem of computing loop bounds from such maps.

Differential Revision: https://reviews.llvm.org/D83158

f9c8febc

[LLVMgold.so] -plugin-opt=save-temps: save combined module to .lto.o instead of .o · 55fa315b

Fangrui Song authored Jul 20, 2020

This matches LLD and fixes https://sourceware.org/bugzilla/show_bug.cgi?id=26262#c1

.o is a bad choice for save-temps output because it is easy to override the bitcode file (*.o)

```
 # Use bfd for the example, -fuse-ld=gold is similar.
clang -flto -c a.c  # generate bitcode file a.o
clang -fuse-ld=bfd -flto a.o -o a -Wl,-plugin-opt=save-temps  # override a.o

 # The user repeats the command but get surprised, because a.o is now a combined module.
clang -fuse-ld=bfd -flto a.o -o a -Wl,-plugin-opt=save-temps
```

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D84132

55fa315b

[OPENMP50]Perform data mapping analysis only for explicitly mapped data. · 2875df0d

Alexey Bataev authored Jul 07, 2020

Summary:
According to OpenMP 5.0, the restrictions for mapping of overlapped data
apply only for explicitly mapped data, there is no restriction for
implicitly mapped data just like in OpenMP 4.5.

Reviewers: jdoerfert

Subscribers: yaxunl, guansong, sstefan1, cfe-commits, caomhin

Tags: #clang

Differential Revision: https://reviews.llvm.org/D83398

2875df0d

[libcxx] Skip tests on GCC · be2267ba

Vy Nguyen authored Jul 20, 2020

Summary: These don't work with GCC

Reviewers: jyknight, #libc!

Subscribers: libcxx-commits

Tags: #libc

Differential Revision: https://reviews.llvm.org/D84183

be2267ba

[ThinLTO] parse flags and blockcount summaries · b3031593

Nick Desaulniers authored Jul 20, 2020

Forked from pr/46523, we were having a hard time running llvm-extract on
IR from a thinLTO build of the Linux kernel.

$ llvm-extract --func jeq_imm jit-42f488b6.ll
llvm-extract: jit-42f488b6.ll:47463:8:
error: Expected 'gv', 'module', or 'typeid' at the start of summary
entry
^209 = flags: 8
       ^

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D82917

b3031593

[Driver] Add --ld-path= and deprecate -fuse-ld=/abs/path and -fuse-ld=rel/path · 1bc5c847

Fangrui Song authored Jul 20, 2020

Supersedes D80225. Add --ld-path= to avoid strange target specific
prefixes and make -fuse-ld= focus on its intended job: "linker flavor".
(-f* affects generated code or language features. --ld-path does not
affect codegen, so it is not named -f*)

The way --ld-path= works is similar to "Command Search and Execution" in POSIX.1-2017 2.9.1 Simple Commands.

If --ld-path= specifies

* an absolute path, the value specifies the linker.
* a relative path without a path component separator (/), the value is searched using the -B, COMPILER_PATH, then PATH.
* a relative path with a path component separator, the linker is found relative to the current working directory.

-fuse-ld= and --ld-path= can be composed, e.g. `-fuse-ld=lld --ld-path=/usr/bin/ld.lld`

The driver can base its linker option decision on the flavor -fuse-ld=, but it should not do fragile
flavor checking with --ld-path=.

Reviewed By: whitequark, keith

Differential Revision: https://reviews.llvm.org/D83015

1bc5c847

Reland [libcxx]Put clang::trivial_abi on smart pointers · 76887bc4

Vy Nguyen authored Jul 13, 2020

    Reviewed By: ldionne,EricWF

    Tags: #libcxx

    Differential Revision: https://reviews.llvm.org/D82490

76887bc4

Require shell for lld/test/ELF/arm-exidx-range.s · 8a197e0b

Hans Wennborg authored Jul 20, 2020

The test fails in 32-bit Windows builds for unclear reasons:

ld.lld: error: failed to open
C:\src\llvm_package_1100-rc1\build32_stage0\tools\lld\test\ELF\Output\arm-exidx-range.s.tmp:
The parameter is incorrect.

8a197e0b

Fix issue in typo handling which could lead clang to hang · dde98c82

David Goldman authored Jul 17, 2020

Summary:
We need to detect when certain TypoExprs are not being transformed
due to invalid trees, otherwise we risk endlessly trying to fix it.

Reviewers: rsmith

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D84067

dde98c82

AMDGPU: Remove outdated fixme · 21ef01b7
Matt Arsenault authored Jul 20, 2020

21ef01b7

AMDGPU: Fix not accounting for constantexpr uses of LDS globals · 84704d98

Matt Arsenault authored Jul 17, 2020

This was failing to add the size of LDS globals that weren't directly
used by an instruction. They could be used by constant expressions
which are transitively used by the function. This requires a better
search, but just abort on this for now for correctness.

84704d98

[Sema] Promote SmallSet of enum to bitset · 177e5acb
Benjamin Kramer authored Jul 20, 2020
```
Even with 300 elements, this still consumes less stack space than the
SmallSet. NFCI.
```
177e5acb
AMDGPU/GlobalISel: Initial Implementation of calls · 61f1f2a2
Matt Arsenault authored Jul 03, 2020
```
Return values, and tail calls are not yet handled.
```
61f1f2a2
Verifier: Check byref address space for AMDGPU calling conventions · 780cef1f
Matt Arsenault authored Jun 29, 2020

780cef1f

Issue error on invalid arithemtic conversions in C ternary · 66aff323

Erich Keane authored Jul 20, 2020

As reported in PR46774, an invalid arithemetic conversion used in a C
ternary operator resulted in an assertion. This patch replaces that
assertion with a diagnostic stating that the conversion failed.

At the moment, I believe the only case of this happening is _ExtInt
types.

66aff323

Verifier: Disallow byval and similar for AMDGPU calling conventions · ad8e900c
Matt Arsenault authored May 07, 2020
```
These imply stack-like semantics, which doesn't make any sense for
entry points.
```
ad8e900c
[Driver] Promote SmallSet of enum to a bitset. NFCI. · f3f1ce4f
Benjamin Kramer authored Jul 20, 2020

f3f1ce4f
Upgrade SmallSets of pointer-like types to SmallPtrSet · 33c9d032
Benjamin Kramer authored Jul 20, 2020
```
This is slightly more efficient. NFC.
```
33c9d032
[MLIR][Shape] Allow `shape.rank` to accept extent tensors `tensor?xindex>` · 71e7a37e
Frederik Gossen authored Jul 20, 2020
```
Differential Revision: https://reviews.llvm.org/D84156
```
71e7a37e
[MLIR][Shape] Allow `cstr_broadcastable` to accept extent tensors · ccb40c84
Frederik Gossen authored Jul 20, 2020
```
Differential Revision: https://reviews.llvm.org/D84155
```
ccb40c84

[DebugInfo] Support for DW_AT_associated and DW_AT_allocated. · 2d10258a

Alok Kumar Sharma authored Jul 20, 2020

Summary:
This support is needed for the Fortran array variables with pointer/allocatable
attribute. This support enables debugger to identify the status of variable
whether that is currently allocated/associated.

  for pointer array (before allocation/association)
  without DW_AT_associated

(gdb) pt ptr
type = integer (140737345375288:140737354129776)
(gdb) p ptr
value requires 35017956 bytes, which is more than max-value-size

  with DW_AT_associated

(gdb) pt ptr
type = integer (:)
(gdb) p ptr
$1 = <not associated>

  for allocatable array (before allocation)

  without DW_AT_allocated

(gdb) pt arr
type = integer (140737345375288:140737354129776)
(gdb) p arr
value requires 35017956 bytes, which is more than max-value-size

  with DW_AT_allocated

(gdb) pt arr
type = integer, allocatable (:)
(gdb) p arr
$1 = <not allocated>

    Testing
- unit test cases added
- check-llvm
- check-debuginfo

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D83544

2d10258a

IR: Define byref parameter attribute · 5e999cbe

Matt Arsenault authored Jun 05, 2020

This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.

This includes the base IR changes, and some tests for places where it
should be treated similarly to byval. Codegen support will be in a
future patch.

My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the
argument, although nothing does this in reality. There is also talk of
removing and replacing the byval attribute, so a new attribute would
need to take its place anyway.

This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.

Background:

There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.

It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.

The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.

This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.

Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.

The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.

Additional context is in D79744.

5e999cbe

MCFixup.h - remove unnecessary MCExpr.h include. NFCI. · 017e5c94

Simon Pilgrim authored Jul 20, 2020

Move the include down to files that actually depend on MCExpr definitions.

Also exposes an implicit dependency on MCContext in AVRAsmBackend.h

017e5c94

CodeGenDAGPatterns.h - remove unnecessary ComplexPattern forward declaration. NFCI. · a0ed0e3f
Simon Pilgrim authored Jul 20, 2020
```
This is defined in CodeGenTarget.h which we have to explicitly include already.
```
a0ed0e3f
CodeGenDAGPatterns.h - remove unused CodeGenHwModes.h include. NFCI. · 93c338fd
Simon Pilgrim authored Jul 20, 2020

93c338fd

AMDGPU/GlobalISel: Legalize s16->s64 G_FPEXT · 6a1030aa

Petar Avramovic authored Jul 20, 2020

Legalize using narrowScalar as s16->s32 G_FPEXT
followed by s32->s64 G_FPEXT.

Differential Revision: https://reviews.llvm.org/D84030

6a1030aa

AMDGPU/GlobalISel: Remove outdated comment · 100564bd
Matt Arsenault authored Jul 18, 2020

100564bd
GlobalISel: Don't handle widenScalar for vector G_INSERT · 5cbd4e41
Matt Arsenault authored Jul 18, 2020
```
This handling didn't make any sense for vectors.
```
5cbd4e41
AMDGPU/GlobalISel: Fix custom lowering of llvm.trunc.f64 for SI · 93311a98
Matt Arsenault authored Jul 19, 2020
```
This was missing an operand from BFE and not erasing the original
instruction.
```
93311a98
AArch64/GlobalISel: Fix hardcoded registers in error message checks · 57aae470
Matt Arsenault authored Jul 19, 2020

57aae470
GlobalISel: Consistently get TII from MIRBuilder · a679f27e
Matt Arsenault authored Jul 19, 2020

a679f27e

[lldb/Utility] Simplify Scalar::SetValueFromData · 7fadd700

Pavel Labath authored Jul 13, 2020

The function was fairly complicated and didn't support new bigger
integer sizes. Use llvm function for loading an APInt from memory to
write a unified implementation for all sizes.

7fadd700

[lldb/test] Simplify Makefile rules for .d files · 9decf040

Pavel Labath authored Jul 15, 2020

The sed line in the rules was adding the .d file as a target to the
dependency rules -- to ensure the file gets rebuild when the sources
change. The same thing can be achieved more elegantly with some -M
flags.

9decf040

[LLE] std::inserter doesn't work with SmallSet, so don't use it. · e88b6ed7
Benjamin Kramer authored Jul 20, 2020

e88b6ed7
[AST][RecoveryExpr] Add recovery-ast tests for C language, NFC. · 70e2c7ad
Haojian Wu authored Jul 20, 2020
```
some examples are working already.

Differential Revision: https://reviews.llvm.org/D84146
```
70e2c7ad
[AST][RecoveryExpr] Fix a crash on opencl C++. · 4b5b7c75
Haojian Wu authored Jul 20, 2020
```
Differential Revision: https://reviews.llvm.org/D84145
```
4b5b7c75
Fix clangd build, NFC · 61d664c9
Haojian Wu authored Jul 20, 2020

61d664c9
[LoopSimplify] Use SmallPtrSet and range for loops more. NFCI. · 44ab60f7
Benjamin Kramer authored Jul 20, 2020

44ab60f7

[AST][RecoveryExpr] Preserve the AST for invalid conditions. · 684e416e

Haojian Wu authored Jul 20, 2020

Adjust an existing diagnostic test, which is an improvement of secondary diagnostic.

Differential Revision: https://reviews.llvm.org/D81163

684e416e

[LLDB/test] Simplify result formatter code · 9199457b

Pavel Labath authored Jul 20, 2020

Now that the main test results are reported through lit, and we only
have one formatter class, this code is unnecessarily baroque.

9199457b