Commits · f178c13fa89960c7247a6367269919acf87fd1b3 · Lorenzo Albano / LLVM bpEVL

Mar 19, 2021

[mlir] Support use-def cycles in graph regions during regionDCE · f178c13f

Andrew Young authored Mar 18, 2021

When deleting operations in DCE, the algorithm uses a post-order walk of
the IR to ensure that value uses were erased before value defs. Graph
regions do not have the same structural invariants as SSA CFG, and this
post order walk could delete value defs before uses.  This problem is
guaranteed to occur when there is a cycle in the use-def graph.

This change stops DCE from visiting the operations and blocks in any
meaningful order.  Instead, we rely on explicitly dropping all uses of a
value before deleting it.

Reviewed By: mehdi_amini, rriddle

Differential Revision: https://reviews.llvm.org/D98919

f178c13f

[mlir] Fix Python bindings tests failure in Debug mode after D98474 · 270a336f

Vladislav Vinogradov authored Mar 19, 2021

Add extra `type.isa<FloatType>()` check to `FloatAttr::get(Type, double)` method.
Otherwise it tries to call `type.cast<FloatType>()`, which fails with assertion in Debug mode.

The `!type.isa<FloatType>()` case just redirercts the call to `FloatAttr::get(Type, APFloat)`,
which will perform the actual check and emit appropriate error.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D98764

270a336f

[IndVars] Provide eliminateIVComparison with context · 16370e02

Max Kazantsev authored Mar 19, 2021

We can prove more predicates when we have a context when eliminating ICmp.
As first (and very obvious) approximation we can use the ICmp instruction itself,
though in the future we are going to use a common dominator of all its users.
Need some refactoring before that.

Observed ~0.5% negative compile time impact.

Differential Revision: https://reviews.llvm.org/D98697
Reviewed By: lebedev.ri

16370e02

[UniqueLinkageName] Use consistent checks when mangling symbo linkage name and debug linkage name. · fc1812a0

Hongtao Yu authored Mar 17, 2021

C functions may be declared and defined in different prototypes like below. This patch unifies the checks for mangling names in symbol linkage name emission and debug linkage name emission so that the two names are consistent.

static int go(int);

static int go(a) int a;
{
  return a;
}

Test Plan:

Differential Revision: https://reviews.llvm.org/D98799

fc1812a0

[CSSPGO] Add attribute metadata for context profile · 1410db70

Wenlei He authored Feb 19, 2021

This changes adds attribute field for metadata of context profile. Currently we have an inline attribute that indicates whether the leaf frame corresponding to a context profile was inlined in previous build.

This will be used to help estimating inlining and be taken into account when trimming context. Changes for that in llvm-profgen will follow. It will also help tuning.

Differential Revision: https://reviews.llvm.org/D98823

1410db70

[SCEV] Add false->any implication · fff1363b

Max Kazantsev authored Mar 19, 2021

By definition of Implication operator, `false -> true` and `false -> false`. It means that
`false` implies any predicate, no matter true or false. We don't need to go any further
trying to prove the statement we need and just always say that `false` implies it in this case.

In practice it means that we are trying to prove something guarded by `false` condition,
which means that this code is unreachable, and we can safely prove any fact or perform any
transform in this code.

Differential Revision: https://reviews.llvm.org/D98706
Reviewed By: lebedev.ri

fff1363b

Fix example in documentation. · d8ab7ad3
Richard Smith authored Mar 18, 2021

d8ab7ad3
Improve documentation for the [[clang::lifetimebound]] attribute. · 5c689e4b
Richard Smith authored Mar 18, 2021

5c689e4b

Don't assume that stepping out of a function will land on the next line. · 71c4da83

Jim Ingham authored Mar 18, 2021

For instance, some recent clang emits this code on x86_64:

    0x100002b99 <+57>: callq  0x100002b40               ; step_out_of_here at main.cpp:11
->  0x100002b9e <+62>: xorl   %eax, %eax
    0x100002ba0 <+64>: popq   %rbp
    0x100002ba1 <+65>: retq

and the "xorl %eax, %eax" is attributed to the same line as the callq.  Since
step out is supposed to stop just on returning from the function, you can't guarantee
it will end up on the next line.  I changed the test to check that we were either
on the call line or on the next line, since either would be right depending on the
debug information.

71c4da83

Add a couple of missing attribute query methods [NFC] · fa26da05
Philip Reames authored Mar 18, 2021

fa26da05

[WebAssembly] Remove experimental instructions from wasm_simd128.h · cbab2cd6

Thomas Lively authored Mar 18, 2021

These experimental builtin functions and the feature macro they were gated
behind have been removed.

Reviewed By: aheejin

Differential Revision: https://reviews.llvm.org/D98907

cbab2cd6

[RISCV] Spilling for Zvlsseg registers. · aa8d33a6

Hsiangkai Wang authored Mar 15, 2021

For Zvlsseg, we create several tuple register classes. When spilling for
these tuple register classes, we need to iterate NF times to load/store
these tuple registers.

Differential Revision: https://reviews.llvm.org/D98629

aa8d33a6

[SanitizerCoverage] Make __start_/__stop_ symbols extern_weak · 9558456b

Fangrui Song authored Mar 18, 2021

On ELF, we place the metadata sections (`__sancov_guards`, `__sancov_cntrs`,
`__sancov_bools`, `__sancov_pcs` in section groups (either `comdat any` or
`comdat noduplicates`).

With `--gc-sections`, LLD since D96753 and GNU ld `-z start-stop-gc` may garbage
collect such sections. If all `__sancov_bools` are discarded, LLD will error
`error: undefined hidden symbol: __start___sancov_cntrs` (other sections are similar).

```
% cat a.c
void discarded() {}
% clang -fsanitize-coverage=func,trace-pc-guard -fpic -fvisibility=hidden a.c -shared -fuse-ld=lld -Wl,--gc-sections
...
ld.lld: error: undefined hidden symbol: __start___sancov_guards
>>> referenced by a.c
>>>               /tmp/a-456662.o:(sancov.module_ctor_trace_pc_guard)
```

Use the `extern_weak` linkage (lowered to undefined weak symbols) to avoid the
undefined error.

Differential Revision: https://reviews.llvm.org/D98903

9558456b

[RISCV] Correct the output chain in lowerFixedLengthVectorMaskedLoadToRVV · c9861f72

Craig Topper authored Mar 18, 2021

We returned the input chain instead of the output chain from the
new load. This bypasses the load in the chain. I haven't found a
good way to test this yet. IR order prevents my initial attempts
at causing reordering.

c9861f72

[dfsan] Add -dfsan-fast-8-labels flag · d10f173f

George Balatsouras authored Mar 16, 2021

This is only adding support to the dfsan instrumentation pass but not
to the runtime.

Added more RUN lines for testing: for each instrumentation test that
had a -dfsan-fast-16-labels invocation, a new invocation was added
using fast8.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D98734

d10f173f

[mlir][tosa] Add lowering for tosa.rescale to linalg.generic · 286a9d46

Rob Suderman authored Mar 18, 2021

This adds a tosa.apply_scale operation that handles the scaling operation
common to quantized operatons. This scalar operation is lowered
in TosaToStandard.

We use a separate ApplyScale factorization as this is a replicable pattern
within TOSA. ApplyScale can be reused within pool/convolution/mul/matmul
for their quantized variants.

Tests are added to both tosa-to-standard and tosa-to-linalg-on-tensors
that verify each pass is correct.

Reviewed By: silvas

Differential Revision: https://reviews.llvm.org/D98753

286a9d46

Recommit "[AArch64][GlobalISel] Fold constants into G_GLOBAL_VALUE" · 0ca83730

Jessica Paquette authored Mar 18, 2021

This reverts commit 962b73dd.

This commit was reverted because of some internal SPEC test failures.

It turns out that this wasn't actually relevant to anything in open source, so
it's safe to recommit this.

0ca83730

Mar 18, 2021

[mlir][tosa] Add tosa.concat to subtensor inserts lowering · 5627564f

Rob Suderman authored Mar 17, 2021

Includes lowering for tosa.concat with indice computation with subtensor insert
operations. Includes tests along two different indices.

Differential Revision: https://reviews.llvm.org/D98813

5627564f

Fix test case in b4a8c0eb · 80df56f7
Yuanfang Chen authored Mar 18, 2021

80df56f7

[DAGCombiner][RISCV] Teach visitMGATHER/MSCATTER to remove gather/scatters... · 182b831a

Craig Topper authored Mar 18, 2021

[DAGCombiner][RISCV] Teach visitMGATHER/MSCATTER to remove gather/scatters with all zeros masks that use SPLAT_VECTOR.

Previously only all zeros BUILD_VECTOR was recognized.

182b831a

[LTO][MC] Discard non-prevailing defined symbols in module-level assembly · b4a8c0eb

Yuanfang Chen authored Mar 18, 2021

This is the alternative approach to D96931.

In LTO, for each module with inlineasm block, prepend directive ".lto_discard <sym>, <sym>*" to the beginning of the inline
asm. ".lto_discard" is both a module inlineasm block marker and (optionally) provides a list of symbols to be discarded.

In MC while emitting for inlineasm, discard symbol binding & symbol
definitions according to ".lto_disard".

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D98762

b4a8c0eb

[OpenMP] Fixed a crash in hidden helper thread · 2df65f87

Shilei Tian authored Mar 18, 2021

It is reported that after enabling hidden helper thread, the program
can hit the assertion `new_gtid < __kmp_threads_capacity` sometimes. The root
cause is explained as follows. Let's say the default `__kmp_threads_capacity` is
`N`. If hidden helper thread is enabled, `__kmp_threads_capacity` will be offset
to `N+8` by default. If the number of threads we need exceeds `N+8`, e.g. via
`num_threads` clause, we need to expand `__kmp_threads`. In
`__kmp_expand_threads`, the expansion starts from `__kmp_threads_capacity`, and
repeatedly doubling it until the new capacity meets the requirement. Let's
assume the new requirement is `Y`.  If `Y` happens to meet the constraint
`(N+8)*2^X=Y` where `X` is the number of iterations, the new capacity is not
enough because we have 8 slots for hidden helper threads.

Here is an example.
```
#include <vector>

int main(int argc, char *argv[]) {
  constexpr const size_t N = 1344;
  std::vector<int> data(N);

#pragma omp parallel for
  for (unsigned i = 0; i < N; ++i) {
    data[i] = i;
  }

#pragma omp parallel for num_threads(N)
  for (unsigned i = 0; i < N; ++i) {
    data[i] += i;
  }

  return 0;
}
```
My CPU is 20C40T, then `__kmp_threads_capacity` is 160. After offset,
`__kmp_threads_capacity` becomes 168. `1344 = (160+8)*2^3`, then the assertions
hit.

Reviewed By: protze.joachim

Differential Revision: https://reviews.llvm.org/D98838

2df65f87

[SelectionDAG] Don't pass a scalable vector to MachinePointerInfo::getWithOffset in a unit test. · 305a0bad

Craig Topper authored Mar 18, 2021

Suppresses an implicit TypeSize to uint64_t conversion warning.

We might be able to just not offset it since we're writing to a
Fixed stack object, but I wasn't sure so I just did what
DAGTypeLegalizer::IncrementPointer does.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D98736

305a0bad

[lli] Add Orc greedy mode as -jit-kind=orc · e1579894

Stefan Gränitz authored Mar 18, 2021

In the existing OrcLazy mode, modules go through partitioning and outgoing calls are replaced by reexport stubs that resolve on call-through. In greedy mode that this patch unlocks for lli, modules materialize as a whole and trigger materialization for all required symbols recursively. This is useful for testing (e.g. D98785) and it's more similar to the way MCJIT works.

e1579894

[mlir] Fix build failure due to 1a572f45 · 44f24f39
thomasraoux authored Mar 18, 2021

44f24f39
[AMDGPU] Remove cpol, tfe, and swz from MUBUF patterns · edd6da10
Stanislav Mekhanoshin authored Mar 15, 2021
```
These are always selected as 0 anyway.

Differential Revision: https://reviews.llvm.org/D98663
```
edd6da10
Revert "Revert "[mlir] Add linalg.fill bufferization conversion"" · fcc1ce00
Lei Zhang authored Mar 18, 2021
```
This reverts commit c69550c1 with
proper fix applied.
```
fcc1ce00

Revert "[mlir] Add linalg.fill bufferization conversion" · c69550c1

Mehdi Amini authored Mar 18, 2021

This reverts commit 32a744ab.

CI is broken:

test/Dialect/Linalg/bufferize.mlir:274:12: error: CHECK: expected string not found in input
 // CHECK: %[[MEMREF:.*]] = tensor_to_memref %[[IN]] : memref<?xf32>
           ^

c69550c1

[AArch64][compiler-rt] Strip PAC from the link register. · 4220531c

Daniel Kiss authored Mar 15, 2021

-mbranch-protection protects the LR on the stack with PAC.
When the frames are walked the LR need to be cleared.
This inline assembly later will be replaced with a new builtin.

Test: build with  -DCMAKE_C_FLAGS="-mbranch-protection=standard".

Reviewed By: kubamracek

Differential Revision: https://reviews.llvm.org/D98008

4220531c

Revert "[AArch64][compiler-rt] Strip PAC from the link register." · c1940aac
Daniel Kiss authored Mar 18, 2021
```
This reverts commit ad40453f.
```
c1940aac

[lldb] Move Apple simulators test targets under API · 36335fe7

Jonas Devlieghere authored Mar 18, 2021

Move the Apple simulators test targets as they only matter for the API
tests.

Differential revision: https://reviews.llvm.org/D98880

36335fe7

[mlir] Add linalg.fill bufferization conversion · 32a744ab

Eugene Zhulenev authored Mar 15, 2021

`BufferizeAnyLinalgOp` fails because `FillOp` is not a `LinalgGenericOp` and it fails while reading operand sizes attribute.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D98671

32a744ab

[clang-cl] make -ffile-compilation-dir a CoreOption. · 1c740b29

Zequan Wu authored Mar 18, 2021

Let clang-cl accepts `-ffile-compilation-dir` flag.

Differential Revision: https://reviews.llvm.org/D98887

1c740b29

[mlir] Add vector op support to cuda-runner including vector.print · 1a572f45
thomasraoux authored Mar 18, 2021
```
Differential Revision: https://reviews.llvm.org/D97346
```
1a572f45

[lldb] Fix flakyness in TestGdbRemote_vContThreads · 0c208d1f

Pavel Labath authored Mar 18, 2021

The cause is the non-async-signal-safety printf function (et al.). If
the test managed to interrupt the process and inject a signal before the
printf("@started") call returned (but after it has actually written the
output), that string could end up being printed twice (presumably,
because the function did not manage the clear the userspace buffer, and
so the print call in the signal handler would print it once again).

This patch fixes the issue by replacing the printf call in the signal
handler with a sprintf+write combo, which should not suffer from that
problem (though I wouldn't go as far as to call it async signal safe).

0c208d1f

[SimplifyCFG] add tests for branch cond merging with prof metadata; NFC · 92068d6c
Sanjay Patel authored Mar 18, 2021
```
See PR49336.
```
92068d6c

[mlir][linalg] Extend linalg vectorization to support non-identity input maps · 16947650

thomasraoux authored Mar 16, 2021

This propagates the affine map to transfer_read op in case it is not a
minor identity map.

Differential Revision: https://reviews.llvm.org/D98523

16947650

Revert "[VPlan] Add plain text (not DOT's digraph) dumps" · 3614df35

Mehdi Amini authored Mar 18, 2021

This reverts commit 6b053c98.
The build is broken:

ld.lld: error: undefined symbol: llvm::VPlan::printDOT(llvm::raw_ostream&) const
>>> referenced by LoopVectorize.cpp
>>>               LoopVectorize.cpp.o:(llvm::LoopVectorizationPlanner::printPlans(llvm::raw_ostream&)) in archive lib/libLLVMVectorize.a

3614df35

[SystemZ][z/OS] vasprintf fix libc++ · f6af5efc

Muiez Ahmed authored Mar 18, 2021

The aim is to use the correct vasprintf implementation for z/OS libc++, where a copy of va_list ap is needed. In particular, it avoids the potential that the initial internal call to vsnprintf will modify ap and the subsequent call to vsnprintf will use that modified ap.

Differential Revision: https://reviews.llvm.org/D97473

f6af5efc

[VPlan] Add plain text (not DOT's digraph) dumps · 6b053c98

Andrei Elovikov authored Mar 18, 2021

I foresee two uses for this:
1) It's easier to use those in debugger.
2) Once we start implementing more VPlan-to-VPlan transformations (especially
   inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in
   LIT test would become too obscure. I can imagine that we'd want to CHECK
   against VPlan dumps after multiple transformations instead. That would be
   easier with plain text dumps than with DOT format.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D96628

6b053c98