- Sep 21, 2021
-
-
Sanjay Patel authored
-
Dmitry Preobrazhensky authored
Differential Revision: https://reviews.llvm.org/D109614
-
hyeongyu kim authored
One of the two inputs of the Shufflevector is often a placeholder. Previously, there were cases where the placeholder was undef, and there were cases where it was poison. I added these constructors to create a placeholder consistently. Changing to use the newly added constructor will be written in a separate patch. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110146
-
Nico Weber authored
-
Jonas Paulsson authored
SystemZ adds the EXRL target instructions in the end of each file. This must be done before debug info emission since that may end the text section, and therefore this is now done in emitConstantPools() (instead of in emitEndOfAsmFile). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D109513
-
Florian Hahn authored
-
Nicholas Guy authored
Enables the FuseAddress feature in the Cortex-A55 scheduling model Differential Revision: https://reviews.llvm.org/D109323
-
Simon Pilgrim authored
If getAggregateElement() returns null for any element, early out as otherwise we will assert when creating a new constant vector Fixes PR51824 + ; OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38057
-
Simon Pilgrim authored
Avoid unnecessary copies, reported by MSVC static analyzer.
-
Simon Pilgrim authored
Avoid unnecessary copies, reported by MSVC static analyzer.
-
Simon Pilgrim authored
Reported by MSVC static analyzer.
-
Dmitry Vyukov authored
Currently we set thr->tctx after OnStarted callback taking thread registry mutex again and searching for the context. But OnStarted already runs under the thread registry mutex and has access to the context, so set it in the OnStarted. This makes code simpler and faster. Depends on D110132. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D110133
-
Dmitry Vyukov authored
Thread state functions are split into 2 parts: tsan entry function (e.g. ThreadStart) and thread registry state change callback (e.g. OnStart). Currently these pairs of functions are located far from each other and in reverse order. This makes it hard to read and follow the logic. Reorder the code so that OnFoo directly follows ThreadFoo. No other code changes. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D110132
-
Dmitry Vyukov authored
Some of the DPrintf's currently produce -Wformat warnings if enabled. Fix these format strings. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D110131
-
Jay Foad authored
FMA_W_CHAIN is used when lowering fdiv f32. Prefer to select it to fmac if there are no source modifiers, just like we do for other mad/mac and fma/fmac cases. Differential Revision: https://reviews.llvm.org/D110074
-
Jay Foad authored
v_fmac with source modifiers forces VOP3 encoding, but it is strictly better to use the VOP3-only v_fma instead, because $dst and $src2 are not tied so it gives the register allocator more freedom and avoids a copy in some cases. This is the same strategy we already use for v_mad vs v_mac and v_fma_legacy vs v_fmac_legacy. Differential Revision: https://reviews.llvm.org/D110070
-
David Green authored
-
Max Kazantsev authored
This is what is supposed to be there.
-
Petar Avramovic authored
Add generic helper function that matches constant splat. It has option to match constant splat with undef (some elements can be undef but not all). Add util function and matcher for G_FCONSTANT splat. Differential Revision: https://reviews.llvm.org/D104410
-
Max Kazantsev authored
The logic in howManyLessThans is fishy. It first checks invariance of RHS, and then uses OrigRHS as argument for isLoopEntryGuardedByCond, which is, strictly saying, a different thing. We are seeing a very rare intermittent failure of availability checks, and it looks like this precondition is sometimes broken. Before we can figure out what's going on, adding asserts that all involved values that may possibly to to isLoopEntryGuardedByCond are available at loop entry. If either of these asserts fails (OrigRHS is the most likely suspect), it means that the logic here is flawed.
-
David Stenberg authored
This fixes PR51730, a heap-use-after-free bug in replaceConditionalBranchesOnConstant(). With the attached reproducer we were left with a function looking something like this after replaceAndRecursivelySimplify(): [...] cont2.i: br i1 %.not1.i, label %handler.type_mismatch3.i, label %cont4.i handler.type_mismatch3.i: %3 = phi i1 [ %2, %cont2.thread.i ], [ false, %cont2.i ] unreachable cont4.i: unreachable [...] with both the branch instruction and PHI node being in the worklist. As a result of replacing the branch instruction with an unconditional branch, the PHI node in %handler.type_mismatch3.i would be removed. This then resulted in a heap-use-after-free bug due to accessing that removed PHI node in the next worklist iteration. This is solved by using a value handle worklist. I am a unsure if this is the most idiomatic solution. Another solution could have been to produce a worklist just containing the interesting branch instructions, but I thought that it perhaps was a bit cleaner to keep all worklist filtering in the loop that does the rewrites. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D109221
-
Justas Janickas authored
RUN line representing C++ for OpenCL 2021 added to the test. This should have been done as part of earlier commit fb321c2e but was missed during rebasing. Differential Revision: https://reviews.llvm.org/D109492
-
Uday Bondhugula authored
NFC updates to gpu.launch op argument const folder. Differential Revision: https://reviews.llvm.org/D110136
-
Andrzej Warzynski authored
This was extracted from the discussion on https://reviews.llvm.org/D108283 . Co-authored-by:
Kiran Chandramohan <kiran.chandramohan@arm.com> Differential Revision: https://reviews.llvm.org/D109871
-
Sylvestre Ledru authored
When building clang in stage2, when -DCMAKE_BUILD_TYPE=RelWithDebInfo is set, the developer can expect that the stage2 clang is built using the same mode. Especially as the performances are much worst in debug mode. (Principle of least astonishment) Differential Revision: https://reviews.llvm.org/D53014
-
Cullen Rhodes authored
Identified in D109359. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D109715
-
Morten Borup Petersen authored
This pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the 'before' region of the while operation, and indctuion variable incrementation + the loop body in the 'after' region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop. Any 'yield' ops in the for-loop are rewritten to additionally yield the (incremented) induction variable. This transformation is useful for passes where we want to consider structured control flow solely on the basis of a loop body and the computation of a loop condition. As an example, when doing high-level synthesis in CIRCT, the incrementation of an IV in a for-loop is "just another part" of a circuit datapath, and what we really care about is the distinction between our datapath and our control logic (the condition variable). Differential Revision: https://reviews.llvm.org/D108454
-
Pavel Labath authored
This test relies on being able to unwind from an arbitrary place inside libc. While I am not sure this is the cause of the observed flakyness, it is known that we are not able to unwind correctly from some places in (linux) libc. This patch adds additional synchronization to ensure that the inferior is in the main function (instead of pthread guts) when lldb tries to unwind it. At the very least, it should make the test runs more predictable/repeatable.
-
Kunwar Shaanjeet Singh Grover authored
This patch adds mergeLocalIds andmergeSymbolIds as public functions for FlatAffineConstraints and FlatAffineValueConstraints respectively. mergeLocalIds is also required to support divisions in intersection, subtraction, equality checks, and complement for PresburgerSet. This patch is part of a series of patches aimed at generalizing affine dependence analysis. Reviewed By: bondhugula Differential Revision: https://reviews.llvm.org/D110045
-
Nathan Ridge authored
Duplicates can sometimes appear due to e.g. explicit template instantiations Differential Revision: https://reviews.llvm.org/D110051
-
Amara Emerson authored
This is motivated by an pathological compile time issue during unmerge combining. We should be able to use the AVF to do simplification. However AMDGPU has a lot of codegen changes which I'm not sure how to evaluate. Differential Revision: https://reviews.llvm.org/D109748
-
Evgeniy Brevnov authored
First (and biggest) change is to use "Killing/Dead" in place of "Later/Earlier" base for names in DSE. For example, [Maybe]DeadLoc - is a location killed by KillingI instruction. I believe such names are more descriptive and easy to understand than current ones. Second, there are inconsistencies in naming where different names are used for the same thing. Fixed that too. Third, reordered parameters of isPartialOverwrite, tryToMergePartialOverlappingStores, isOverwrite to make them consistent between each other. This greatly reduces potential mistakes. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D106947
-
Amara Emerson authored
For artifacts excluding G_TRUNC/G_SEXT, which have IR counterparts, we don't seem to have debug users of defs. However, in the legalizer we're always calling MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() which is expensive. In some rare cases, this contributes significantly to unreasonably long compile times when we have lots of artifact combiner activity. To verify this, I added asserts to that function when it actually replaced a debug use operand with undef for these artifacts. On CTMark with both -O0 and -Os and debug info enabled, I didn't see a single case where it triggered. In my measurements I saw around a 0.5% geomean compile-time improvement on -g -O0 for AArch64 with this change. Differential Revision: https://reviews.llvm.org/D109750
-
Max Kazantsev authored
The implication logic for two values that are both negative or non-negative says that it doesn't matter whether their predicate is signed and unsigned, but only flips unsigned into signed for further inference. This patch adds support for flipping a signed predicate into unsigned as well. Differential Revision: https://reviews.llvm.org/D109959 Reviewed By: nikic
-
Yonghong Song authored
In llvm, for non-alu32 mode, the stack alignment is 64bit so only one 64bit spill per 64bit slot. For alu32 mode, the stack alignment is 32bit, so it is possible to have two 32bit spills per 64bit slot. Currently, bpf kernel verifier does not preserve register states for 32bit spills. That is, one 32bit register may hold a constant value or a bounded range before spill. After reload from the stack, the information is lost and sometimes this may cause verifier failure. For 64bit register spill, the verifier indeed tries to preserve the register state for reloading. The current verifier can be modestly changed to handle one 32bit spill per 64bit stack slot with state-preserving reload. Handling two 32bit spills per 64bit stack slot will require substantial changes. This patch changes stack alignment for alu32 to be 64bit. This way, for any 64bit slot in alu32 mode, only one 32bit or 64bit register values can be saved. Together with previous-mentioned verifier enhancement, 32bit spill can be handled with state preserving. Note that llvm stack slot coallescing seems only doing adjacent packing which may leave some holes in the stack. For example, stack slot 8 <== 8 bytes stack slot 4 <== 8 bytes with 4 byte hole stack slot 8 <== 8 bytes stack slot 4 <== 4 bytes Differential Revision: https://reviews.llvm.org/D109073
-
Chris Lattner authored
Lots of custom ops have hand-rolled comma-delimited parsing loops, as does the MLIR parser itself. Provides a standard interface for doing this that is less error prone and less boilerplate. While here, extend Delimiter to support <> and {} delimited sequences as well (I have a use for <> in CIRCT specifically). Differential Revision: https://reviews.llvm.org/D110122
-
Max Kazantsev authored
When following a case of a switch instruction is guaranteed to lead to UB, we can safely break these edges and redirect those cases into a newly created unreachable block. As result, CFG will become simpler and we can remove some of Phi inputs to make further analyzes easier. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D109428 Reviewed By: lebedev.ri
-
Michael Kruse authored
This metadata was intended to mark all accesses within an iteration to be pairwise non-aliasing, in this case because every memory of a base pointer is touched (read or write) at most once. This is typical for 'sweeps' over all data. The stated motivation from D30606 is to ensure that unrolled iterations are considered non-aliasing. Rhe implemention had multiple issues: * The structure of the noalias metadata was malformed. D110026 added check in the verifier for this metadata, and the tests were failing since then. * This is not true for the outer loops of the BLIS matrix multiplication, where it was being inserted. Each element of A, B, C is accessed multiple times, as often as the loop not used as an index is iterating. * Scopes were added to SecondLevelOtherAliasScopeList (used for the !noalias scop list) on-the-fly when another SCEV was seen. This meant that previously visited instructions would not be updated with alias scopes that are only seen later, missing out those SCEVs they should not be aliasing with. * Since the !noalias scope list would ideally consists of all other SCEV for this base pointer, we might run quickly into scalability issues. Especially after unrolling there would probably at least once SCEV per instruction and unroll instance. * The inter-iteration noalias base pointer was not removed after leaving the loop marked with it, effectively marking everything after it to noalias as well. A solution I considered was to mark each instruction as non-aliasing with its own scope. The instruction itself would obviously alias itself, but such construction might also be considered invalid. Duplicating the instruction (e.g. due to speculation) would mark the instruction non-aliasing with its clone. I don't want to go into this territory, especially since the original motivation of determining unrolled instances as noalias based on SCEV is the what scev-aa does as well. This effectively reverts D30606 and D35761.
-
Max Kazantsev authored
-
Kazu Hirata authored
-