- Mar 18, 2022
-
-
Vasileios Porpodas authored
Splat loads are inexpensive in X86. For a 2-lane vector we need just one instruction: `movddup (%reg), xmm0`. Using the standard Splat score leads to worse code. This patch adds a new score dedicated for splat loads. Please note that a splat is usually three IR instructions: - It is usually a load and 2 inserts: %ld = load double, double* %gep %ins1 = insertelement <2 x double> poison, double %ld, i32 0 %ins2 = insertelement <2 x double> %ins1, double %ld, i32 1 - But it can also be a load, an insert and a shuffle: %ld = load double, double* %gep %ins = insertelement <2 x double> poison, double %ld, i32 0 %shf = shufflevector <2 x double> %ins, <2 x double> poison, <2 x i32> zeroinitializer Because of this some of the lit tests contain more IR instructions. Differential Revision: https://reviews.llvm.org/D121354
-
Vasileios Porpodas authored
Differential Revision: https://reviews.llvm.org/D121353
-
Benjamin Kramer authored
-
Paul Kirth authored
This reverts commit 6cf560d6.
-
Nico Weber authored
-
Paul Kirth authored
I mistakenly reverted my commit, so I'm relanding it. This reverts commit 10866a1d.
-
Paul Kirth authored
This reverts commit e7749d47.
-
Paul Kirth authored
Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Differential Revision: https://reviews.llvm.org/D115907
-
Yonghong Song authored
When investigating an issue with bcc tool inject.py, I found a verifier failure with latest clang. The portion of code can be illustrated as below: struct pid_struct { u64 curr_call; u64 conds_met; u64 stack[2]; }; struct pid_struct *bpf_map_lookup_elem(); int foo() { struct pid_struct *p = bpf_map_lookup_elem(); if (!p) return 0; p->curr_call--; if (p->conds_met < 1 || p->conds_met >= 3) return 0; if (p->stack[p->conds_met - 1] == p->curr_call) p->conds_met--; ... } The verifier failure looks like: ... 8: (79) r1 = *(u64 *)(r0 +0) R0_w=map_value(id=0,off=0,ks=4,vs=32,imm=0) R10=fp0 fp-8=mmmm???? 9: (07) r1 += -1 10: (7b) *(u64 *)(r0 +0) = r1 R0_w=map_value(id=0,off=0,ks=4,vs=32,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmm???? 11: (79) r2 = *(u64 *)(r0 +8) R0_w=map_value(id=0,off=0,ks=4,vs=32,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmm???? 12: (bf) r3 = r2 13: (07) r3 += -3 14: (b7) r4 = -2 15: (2d) if r4 > r3 goto pc+13 R0=map_value(id=0,off=0,ks=4,vs=32,imm=0) R1=inv(id=0) R2=inv(id=2) R3=inv(id=0,umin_value=18446744073709551614,var_off=(0xffffffff00000000; 0xffffffff)) R4=inv-2 R10=fp0 fp-8=mmmm???? 16: (07) r2 += -1 17: (bf) r3 = r2 18: (67) r3 <<= 3 19: (bf) r4 = r0 20: (0f) r4 += r3 math between map_value pointer and register with unbounded min value is not allowed Here the compiler optimized "p->conds_met < 1 || p->conds_met >= 3" to r2 = p->conds_met r3 = r2 r3 += -3 r4 = -2 if (r3 < r4) return 0 r2 += -1 r3 = r2 ... In the above, r3 is initially equal to r2, but is modified used by the comparison. But later on r2 is used again. This caused verification failure. BPF backend has a pass, AdjustOpt, to prevent such transformation, but only focused on signed integers since typical bpf helper returns signed integers. To fix this case, let us handle unsigned integers as well. Differential Revision: https://reviews.llvm.org/D121937
-
- Mar 17, 2022
-
-
Alina Sbirlea authored
-
Mehdi Amini authored
This reverts commit dad80e97. The build is broken with some configurations (gcc-5 and gcc-8): mlir/lib/Analysis/Presburger/PresburgerRelation.cpp:402:32: error: qualified name does not name a class before '{' token class presburger::SetCoalescer {
-
Stanislav Mekhanoshin authored
-
Johannes Doerfert authored
-
Johannes Doerfert authored
-
Johannes Doerfert authored
The reference was taken and the map was modified after. This can (and did) lead to dangling pointers and all sorts of problems afterwards.
-
Ellis Hoag authored
Failures in `InlineFunction()` are caught after D121722, but `emitInlinedIntoBasedOnCost()` should only be called when inlining is successful. This also removes an unnecessary call to `shouldInline()` which always returned `InlineCost::getAlways()`. Reviewed By: kyulee, nikic Differential Revision: https://reviews.llvm.org/D121946
-
Alina Sbirlea authored
Add more details to the docs regarding optimized accesses for Uses and Defs. Include incoming changes from https://reviews.llvm.org/D121381. Differential Revision: https://reviews.llvm.org/D121740
-
Thomas Lively authored
Add a test checking that each SIMD intrinsic produces the expected instruction. Since this test spans both clang and LLVM, place it in a new intrinsic-header-tests subdirectory of cross-project-tests. This revives D101684 now that cross-project-tests exists. In practice, the tests of lowering from wasm_simd128.h to LLVM IR were not as useful as this end-to-end test. Updates the version check of gdb in cross-project-tests/lit.cfg.py so that unexpected version formats do not prevent the new tests from running. Depends on D121661. Differential Revision: https://reviews.llvm.org/D121662
-
Mehdi Amini authored
Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D121750
-
Jonas Devlieghere authored
-
Kyungwoo Lee authored
We often failed in the assertion, non-deterministically with a large IR: ``` Assertion `notDifferentParent(LocA.Ptr, LocB.Ptr) && "BasicAliasAnalysis doesn't support interprocedural queries." ``` Looking at the comment in https://reviews.llvm.org/D87806, it appears it's actually a module pass for new PM while the legacy PM still works as a function pass. The fix is to align the same behavior in between new PM and old PM, which initializes ObjCARCContract for each function. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D121949
-
Nikolas Klauser authored
This tests the same QoI issue as the existing STL Classic test, but for the Ranges algorithms. Also, do the same thing for all the algorithms that take projections. I found a few missing algorithms and added them to the existing test, too. `std::find_first_of` currently fails; I should look at why that is (and in particular, what is it doing weird that //makes// it inconsistent with the entire rest of libc++?). Reviewed By: ldionne, #libc Spies: libcxx-commits Differential Revision: https://reviews.llvm.org/D121265
-
Louis Dionne authored
This will make it possible to add a timeout when running the tests.
-
Changpeng Fang authored
Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1]. To get aperture bases when targets do not have aperture registers, we load private_base or shared_base directly from the implicit kernarg. In clang, we use implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}. Reviewers: arsenm, sameerds, yaxunl Differential Revision: https://reviews.llvm.org/D120265
-
Louis Dionne authored
-
Sam Clegg authored
In programs that don't otherwise depend on `__tls_base` it won't be marked as live. However this symbol is used internally in a couple of places do we need to mark it as live explictily in those places. Fixes: #54386 Differential Revision: https://reviews.llvm.org/D121931
-
Eli Friedman authored
-
Andrew Litteken authored
As pointed out in https://github.com/llvm/llvm-project/issues/54155#issuecomment-1057465479, there was a crash when loop info was being outlined. It was not being properly stripped and adjusted, so would point to the wrong location. This uses similar logic found in the CodeExtractor to adjust the loop debug info. Reviewer: fhahn, paquette Differential Revision: https://reviews.llvm.org/D120869
-
Valentin Clement authored
This patch adds some tests for the lowering of array constructors. This patch is part of the upstreaming effort from fir-dev branch. Reviewed By: PeteSteinfeld Differential Revision: https://reviews.llvm.org/D121945 Co-authored-by:
mleair <leairmark@gmail.com> Co-authored-by:
Jean Perier <jperier@nvidia.com> Co-authored-by:
Eric Schweitz <eschweitz@nvidia.com> Co-authored-by:
V Donaldson <vdonaldson@nvidia.com>
-
Stanislav Mekhanoshin authored
Old names are supported as aliases. _1k MFMA got new opcodes. Differential Revision: https://reviews.llvm.org/D121741
-
Stanislav Mekhanoshin authored
Differential Revision: https://reviews.llvm.org/D120849
-
Ben Barham authored
For now most are implemented by printing out the name of the filesystem, but this can be expanded in the future. Only `OverlayFileSystem` and `RedirectingFileSystem` are properly implemented in this patch. - `OverlayFileSystem`: Prints each filesystem in the order that any operations are actually run on them. Optionally prints recursively. - `RedirectingFileSystem`: Prints out all mappings, as well as the `ExternalFS`. Most of this was already implemented other than the handling for the `DirectoryRemap` case and to actually print out the mapping. Each FS should implement `printImpl` rather than `print`, where the latter just fowards to the former. This is to avoid spreading the default arguments through to the subclasses (where we may miss updating in the future). Differential Revision: https://reviews.llvm.org/D121421
-
Johannes Doerfert authored
The update_cc script should really do this automatically :(
-
Michel Weber authored
This patch refactors the current coalesce implementation. It introduces the `SetCoalescer`, a class in which all coalescing functionality lives. The main advantage over the old design is the fact that the vectors of constraints do not have to be passed around, but are implemented as private fields of the SetCoalescer. This will become especially important once more inequality types are introduced. Reviewed By: arjunp Differential Revision: https://reviews.llvm.org/D121364
-
Benjamin Kramer authored
No need for a unordered_map of enum, which is also broken in GCC before 6.1. No functionality change intended.
-
Benjamin Kramer authored
So we don't end up with a copy of std::sort in every dialect definition. NFCI.
-
Benjamin Kramer authored
This isn't performance sensitive and array_pod_sort is a lot smaller. NFCI.
-
Stanislav Mekhanoshin authored
Differential Revision: https://reviews.llvm.org/D121843
-
LLVM GN Syncbot authored
-
Kevin P. Neal authored
[FPEnv][InstSimplify] Teach CannotBeNegativeZero() about constrained intrinsics.
-