- Mar 19, 2021
-
-
Andrew Young authored
When deleting operations in DCE, the algorithm uses a post-order walk of the IR to ensure that value uses were erased before value defs. Graph regions do not have the same structural invariants as SSA CFG, and this post order walk could delete value defs before uses. This problem is guaranteed to occur when there is a cycle in the use-def graph. This change stops DCE from visiting the operations and blocks in any meaningful order. Instead, we rely on explicitly dropping all uses of a value before deleting it. Reviewed By: mehdi_amini, rriddle Differential Revision: https://reviews.llvm.org/D98919
-
Vladislav Vinogradov authored
Add extra `type.isa<FloatType>()` check to `FloatAttr::get(Type, double)` method. Otherwise it tries to call `type.cast<FloatType>()`, which fails with assertion in Debug mode. The `!type.isa<FloatType>()` case just redirercts the call to `FloatAttr::get(Type, APFloat)`, which will perform the actual check and emit appropriate error. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D98764
-
Max Kazantsev authored
We can prove more predicates when we have a context when eliminating ICmp. As first (and very obvious) approximation we can use the ICmp instruction itself, though in the future we are going to use a common dominator of all its users. Need some refactoring before that. Observed ~0.5% negative compile time impact. Differential Revision: https://reviews.llvm.org/D98697 Reviewed By: lebedev.ri
-
Hongtao Yu authored
C functions may be declared and defined in different prototypes like below. This patch unifies the checks for mangling names in symbol linkage name emission and debug linkage name emission so that the two names are consistent. static int go(int); static int go(a) int a; { return a; } Test Plan: Differential Revision: https://reviews.llvm.org/D98799
-
Wenlei He authored
This changes adds attribute field for metadata of context profile. Currently we have an inline attribute that indicates whether the leaf frame corresponding to a context profile was inlined in previous build. This will be used to help estimating inlining and be taken into account when trimming context. Changes for that in llvm-profgen will follow. It will also help tuning. Differential Revision: https://reviews.llvm.org/D98823
-
Max Kazantsev authored
By definition of Implication operator, `false -> true` and `false -> false`. It means that `false` implies any predicate, no matter true or false. We don't need to go any further trying to prove the statement we need and just always say that `false` implies it in this case. In practice it means that we are trying to prove something guarded by `false` condition, which means that this code is unreachable, and we can safely prove any fact or perform any transform in this code. Differential Revision: https://reviews.llvm.org/D98706 Reviewed By: lebedev.ri
-
Richard Smith authored
-
Richard Smith authored
-
Jim Ingham authored
For instance, some recent clang emits this code on x86_64: 0x100002b99 <+57>: callq 0x100002b40 ; step_out_of_here at main.cpp:11 -> 0x100002b9e <+62>: xorl %eax, %eax 0x100002ba0 <+64>: popq %rbp 0x100002ba1 <+65>: retq and the "xorl %eax, %eax" is attributed to the same line as the callq. Since step out is supposed to stop just on returning from the function, you can't guarantee it will end up on the next line. I changed the test to check that we were either on the call line or on the next line, since either would be right depending on the debug information.
-
Philip Reames authored
-
Thomas Lively authored
These experimental builtin functions and the feature macro they were gated behind have been removed. Reviewed By: aheejin Differential Revision: https://reviews.llvm.org/D98907
-
Hsiangkai Wang authored
For Zvlsseg, we create several tuple register classes. When spilling for these tuple register classes, we need to iterate NF times to load/store these tuple registers. Differential Revision: https://reviews.llvm.org/D98629
-
Fangrui Song authored
On ELF, we place the metadata sections (`__sancov_guards`, `__sancov_cntrs`, `__sancov_bools`, `__sancov_pcs` in section groups (either `comdat any` or `comdat noduplicates`). With `--gc-sections`, LLD since D96753 and GNU ld `-z start-stop-gc` may garbage collect such sections. If all `__sancov_bools` are discarded, LLD will error `error: undefined hidden symbol: __start___sancov_cntrs` (other sections are similar). ``` % cat a.c void discarded() {} % clang -fsanitize-coverage=func,trace-pc-guard -fpic -fvisibility=hidden a.c -shared -fuse-ld=lld -Wl,--gc-sections ... ld.lld: error: undefined hidden symbol: __start___sancov_guards >>> referenced by a.c >>> /tmp/a-456662.o:(sancov.module_ctor_trace_pc_guard) ``` Use the `extern_weak` linkage (lowered to undefined weak symbols) to avoid the undefined error. Differential Revision: https://reviews.llvm.org/D98903
-
Craig Topper authored
We returned the input chain instead of the output chain from the new load. This bypasses the load in the chain. I haven't found a good way to test this yet. IR order prevents my initial attempts at causing reordering.
-
George Balatsouras authored
This is only adding support to the dfsan instrumentation pass but not to the runtime. Added more RUN lines for testing: for each instrumentation test that had a -dfsan-fast-16-labels invocation, a new invocation was added using fast8. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D98734
-
Rob Suderman authored
This adds a tosa.apply_scale operation that handles the scaling operation common to quantized operatons. This scalar operation is lowered in TosaToStandard. We use a separate ApplyScale factorization as this is a replicable pattern within TOSA. ApplyScale can be reused within pool/convolution/mul/matmul for their quantized variants. Tests are added to both tosa-to-standard and tosa-to-linalg-on-tensors that verify each pass is correct. Reviewed By: silvas Differential Revision: https://reviews.llvm.org/D98753
-
Jessica Paquette authored
This reverts commit 962b73dd. This commit was reverted because of some internal SPEC test failures. It turns out that this wasn't actually relevant to anything in open source, so it's safe to recommit this.
-
- Mar 18, 2021
-
-
Rob Suderman authored
Includes lowering for tosa.concat with indice computation with subtensor insert operations. Includes tests along two different indices. Differential Revision: https://reviews.llvm.org/D98813
-
Yuanfang Chen authored
-
Craig Topper authored
[DAGCombiner][RISCV] Teach visitMGATHER/MSCATTER to remove gather/scatters with all zeros masks that use SPLAT_VECTOR. Previously only all zeros BUILD_VECTOR was recognized.
-
Yuanfang Chen authored
This is the alternative approach to D96931. In LTO, for each module with inlineasm block, prepend directive ".lto_discard <sym>, <sym>*" to the beginning of the inline asm. ".lto_discard" is both a module inlineasm block marker and (optionally) provides a list of symbols to be discarded. In MC while emitting for inlineasm, discard symbol binding & symbol definitions according to ".lto_disard". Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D98762
-
Shilei Tian authored
It is reported that after enabling hidden helper thread, the program can hit the assertion `new_gtid < __kmp_threads_capacity` sometimes. The root cause is explained as follows. Let's say the default `__kmp_threads_capacity` is `N`. If hidden helper thread is enabled, `__kmp_threads_capacity` will be offset to `N+8` by default. If the number of threads we need exceeds `N+8`, e.g. via `num_threads` clause, we need to expand `__kmp_threads`. In `__kmp_expand_threads`, the expansion starts from `__kmp_threads_capacity`, and repeatedly doubling it until the new capacity meets the requirement. Let's assume the new requirement is `Y`. If `Y` happens to meet the constraint `(N+8)*2^X=Y` where `X` is the number of iterations, the new capacity is not enough because we have 8 slots for hidden helper threads. Here is an example. ``` #include <vector> int main(int argc, char *argv[]) { constexpr const size_t N = 1344; std::vector<int> data(N); #pragma omp parallel for for (unsigned i = 0; i < N; ++i) { data[i] = i; } #pragma omp parallel for num_threads(N) for (unsigned i = 0; i < N; ++i) { data[i] += i; } return 0; } ``` My CPU is 20C40T, then `__kmp_threads_capacity` is 160. After offset, `__kmp_threads_capacity` becomes 168. `1344 = (160+8)*2^3`, then the assertions hit. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D98838
-
Craig Topper authored
Suppresses an implicit TypeSize to uint64_t conversion warning. We might be able to just not offset it since we're writing to a Fixed stack object, but I wasn't sure so I just did what DAGTypeLegalizer::IncrementPointer does. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D98736
-
Stefan Gränitz authored
In the existing OrcLazy mode, modules go through partitioning and outgoing calls are replaced by reexport stubs that resolve on call-through. In greedy mode that this patch unlocks for lli, modules materialize as a whole and trigger materialization for all required symbols recursively. This is useful for testing (e.g. D98785) and it's more similar to the way MCJIT works.
-
thomasraoux authored
-
Stanislav Mekhanoshin authored
These are always selected as 0 anyway. Differential Revision: https://reviews.llvm.org/D98663
-
Mehdi Amini authored
This reverts commit 32a744ab. CI is broken: test/Dialect/Linalg/bufferize.mlir:274:12: error: CHECK: expected string not found in input // CHECK: %[[MEMREF:.*]] = tensor_to_memref %[[IN]] : memref<?xf32> ^
-
Daniel Kiss authored
-mbranch-protection protects the LR on the stack with PAC. When the frames are walked the LR need to be cleared. This inline assembly later will be replaced with a new builtin. Test: build with -DCMAKE_C_FLAGS="-mbranch-protection=standard". Reviewed By: kubamracek Differential Revision: https://reviews.llvm.org/D98008
-
Daniel Kiss authored
This reverts commit ad40453f.
-
Jonas Devlieghere authored
Move the Apple simulators test targets as they only matter for the API tests. Differential revision: https://reviews.llvm.org/D98880
-
Eugene Zhulenev authored
`BufferizeAnyLinalgOp` fails because `FillOp` is not a `LinalgGenericOp` and it fails while reading operand sizes attribute. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D98671
-
Zequan Wu authored
Let clang-cl accepts `-ffile-compilation-dir` flag. Differential Revision: https://reviews.llvm.org/D98887
-
thomasraoux authored
Differential Revision: https://reviews.llvm.org/D97346
-
Pavel Labath authored
The cause is the non-async-signal-safety printf function (et al.). If the test managed to interrupt the process and inject a signal before the printf("@started") call returned (but after it has actually written the output), that string could end up being printed twice (presumably, because the function did not manage the clear the userspace buffer, and so the print call in the signal handler would print it once again). This patch fixes the issue by replacing the printf call in the signal handler with a sprintf+write combo, which should not suffer from that problem (though I wouldn't go as far as to call it async signal safe).
-
Sanjay Patel authored
See PR49336.
-
thomasraoux authored
This propagates the affine map to transfer_read op in case it is not a minor identity map. Differential Revision: https://reviews.llvm.org/D98523
-
Mehdi Amini authored
This reverts commit 6b053c98. The build is broken: ld.lld: error: undefined symbol: llvm::VPlan::printDOT(llvm::raw_ostream&) const >>> referenced by LoopVectorize.cpp >>> LoopVectorize.cpp.o:(llvm::LoopVectorizationPlanner::printPlans(llvm::raw_ostream&)) in archive lib/libLLVMVectorize.a
-
Muiez Ahmed authored
The aim is to use the correct vasprintf implementation for z/OS libc++, where a copy of va_list ap is needed. In particular, it avoids the potential that the initial internal call to vsnprintf will modify ap and the subsequent call to vsnprintf will use that modified ap. Differential Revision: https://reviews.llvm.org/D97473
-
Andrei Elovikov authored
I foresee two uses for this: 1) It's easier to use those in debugger. 2) Once we start implementing more VPlan-to-VPlan transformations (especially inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in LIT test would become too obscure. I can imagine that we'd want to CHECK against VPlan dumps after multiple transformations instead. That would be easier with plain text dumps than with DOT format. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96628
-