- Sep 21, 2021
-
-
Sanjay Patel authored
-
Dmitry Preobrazhensky authored
Differential Revision: https://reviews.llvm.org/D109614
-
hyeongyu kim authored
One of the two inputs of the Shufflevector is often a placeholder. Previously, there were cases where the placeholder was undef, and there were cases where it was poison. I added these constructors to create a placeholder consistently. Changing to use the newly added constructor will be written in a separate patch. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110146
-
Nico Weber authored
-
Jonas Paulsson authored
SystemZ adds the EXRL target instructions in the end of each file. This must be done before debug info emission since that may end the text section, and therefore this is now done in emitConstantPools() (instead of in emitEndOfAsmFile). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D109513
-
Florian Hahn authored
-
Nicholas Guy authored
Enables the FuseAddress feature in the Cortex-A55 scheduling model Differential Revision: https://reviews.llvm.org/D109323
-
Simon Pilgrim authored
If getAggregateElement() returns null for any element, early out as otherwise we will assert when creating a new constant vector Fixes PR51824 + ; OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38057
-
Simon Pilgrim authored
Avoid unnecessary copies, reported by MSVC static analyzer.
-
Simon Pilgrim authored
Avoid unnecessary copies, reported by MSVC static analyzer.
-
Simon Pilgrim authored
Reported by MSVC static analyzer.
-
Jay Foad authored
FMA_W_CHAIN is used when lowering fdiv f32. Prefer to select it to fmac if there are no source modifiers, just like we do for other mad/mac and fma/fmac cases. Differential Revision: https://reviews.llvm.org/D110074
-
Jay Foad authored
v_fmac with source modifiers forces VOP3 encoding, but it is strictly better to use the VOP3-only v_fma instead, because $dst and $src2 are not tied so it gives the register allocator more freedom and avoids a copy in some cases. This is the same strategy we already use for v_mad vs v_mac and v_fma_legacy vs v_fmac_legacy. Differential Revision: https://reviews.llvm.org/D110070
-
David Green authored
-
Max Kazantsev authored
This is what is supposed to be there.
-
Petar Avramovic authored
Add generic helper function that matches constant splat. It has option to match constant splat with undef (some elements can be undef but not all). Add util function and matcher for G_FCONSTANT splat. Differential Revision: https://reviews.llvm.org/D104410
-
Max Kazantsev authored
The logic in howManyLessThans is fishy. It first checks invariance of RHS, and then uses OrigRHS as argument for isLoopEntryGuardedByCond, which is, strictly saying, a different thing. We are seeing a very rare intermittent failure of availability checks, and it looks like this precondition is sometimes broken. Before we can figure out what's going on, adding asserts that all involved values that may possibly to to isLoopEntryGuardedByCond are available at loop entry. If either of these asserts fails (OrigRHS is the most likely suspect), it means that the logic here is flawed.
-
David Stenberg authored
This fixes PR51730, a heap-use-after-free bug in replaceConditionalBranchesOnConstant(). With the attached reproducer we were left with a function looking something like this after replaceAndRecursivelySimplify(): [...] cont2.i: br i1 %.not1.i, label %handler.type_mismatch3.i, label %cont4.i handler.type_mismatch3.i: %3 = phi i1 [ %2, %cont2.thread.i ], [ false, %cont2.i ] unreachable cont4.i: unreachable [...] with both the branch instruction and PHI node being in the worklist. As a result of replacing the branch instruction with an unconditional branch, the PHI node in %handler.type_mismatch3.i would be removed. This then resulted in a heap-use-after-free bug due to accessing that removed PHI node in the next worklist iteration. This is solved by using a value handle worklist. I am a unsure if this is the most idiomatic solution. Another solution could have been to produce a worklist just containing the interesting branch instructions, but I thought that it perhaps was a bit cleaner to keep all worklist filtering in the loop that does the rewrites. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D109221
-
Cullen Rhodes authored
Identified in D109359. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D109715
-
Amara Emerson authored
This is motivated by an pathological compile time issue during unmerge combining. We should be able to use the AVF to do simplification. However AMDGPU has a lot of codegen changes which I'm not sure how to evaluate. Differential Revision: https://reviews.llvm.org/D109748
-
Evgeniy Brevnov authored
First (and biggest) change is to use "Killing/Dead" in place of "Later/Earlier" base for names in DSE. For example, [Maybe]DeadLoc - is a location killed by KillingI instruction. I believe such names are more descriptive and easy to understand than current ones. Second, there are inconsistencies in naming where different names are used for the same thing. Fixed that too. Third, reordered parameters of isPartialOverwrite, tryToMergePartialOverlappingStores, isOverwrite to make them consistent between each other. This greatly reduces potential mistakes. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D106947
-
Amara Emerson authored
For artifacts excluding G_TRUNC/G_SEXT, which have IR counterparts, we don't seem to have debug users of defs. However, in the legalizer we're always calling MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() which is expensive. In some rare cases, this contributes significantly to unreasonably long compile times when we have lots of artifact combiner activity. To verify this, I added asserts to that function when it actually replaced a debug use operand with undef for these artifacts. On CTMark with both -O0 and -Os and debug info enabled, I didn't see a single case where it triggered. In my measurements I saw around a 0.5% geomean compile-time improvement on -g -O0 for AArch64 with this change. Differential Revision: https://reviews.llvm.org/D109750
-
Max Kazantsev authored
The implication logic for two values that are both negative or non-negative says that it doesn't matter whether their predicate is signed and unsigned, but only flips unsigned into signed for further inference. This patch adds support for flipping a signed predicate into unsigned as well. Differential Revision: https://reviews.llvm.org/D109959 Reviewed By: nikic
-
Yonghong Song authored
In llvm, for non-alu32 mode, the stack alignment is 64bit so only one 64bit spill per 64bit slot. For alu32 mode, the stack alignment is 32bit, so it is possible to have two 32bit spills per 64bit slot. Currently, bpf kernel verifier does not preserve register states for 32bit spills. That is, one 32bit register may hold a constant value or a bounded range before spill. After reload from the stack, the information is lost and sometimes this may cause verifier failure. For 64bit register spill, the verifier indeed tries to preserve the register state for reloading. The current verifier can be modestly changed to handle one 32bit spill per 64bit stack slot with state-preserving reload. Handling two 32bit spills per 64bit stack slot will require substantial changes. This patch changes stack alignment for alu32 to be 64bit. This way, for any 64bit slot in alu32 mode, only one 32bit or 64bit register values can be saved. Together with previous-mentioned verifier enhancement, 32bit spill can be handled with state preserving. Note that llvm stack slot coallescing seems only doing adjacent packing which may leave some holes in the stack. For example, stack slot 8 <== 8 bytes stack slot 4 <== 8 bytes with 4 byte hole stack slot 8 <== 8 bytes stack slot 4 <== 4 bytes Differential Revision: https://reviews.llvm.org/D109073
-
Max Kazantsev authored
When following a case of a switch instruction is guaranteed to lead to UB, we can safely break these edges and redirect those cases into a newly created unreachable block. As result, CFG will become simpler and we can remove some of Phi inputs to make further analyzes easier. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D109428 Reviewed By: lebedev.ri
-
Max Kazantsev authored
-
Kazu Hirata authored
-
River Riddle authored
This revision refactors ElementsAttr into an Attribute Interface. This enables a common interface with which to interact with element attributes, without needing to modify the builtin dialect. It also removes a majority (if not all?) of the need for the current OpaqueElementsAttr, which was originally intended as a way to opaquely represent data that was not representable by the other builtin constructs. The new ElementsAttr interface not only allows for users to natively represent their data in the way that best suits them, it also allows for efficient opaque access and iteration of the underlying data. Attributes using the ElementsAttr interface can directly expose support for interacting with the held elements using any C++ data type they claim to support. For example, DenseIntOrFpElementsAttr supports iteration using various native C++ integer/float data types, as well as APInt/APFloat, and more. ElementsAttr instances that refer to DenseIntOrFpElementsAttr can use all of these data types for iteration: ```c++ DenseIntOrFpElementsAttr intElementsAttr = ...; ElementsAttr attr = intElementsAttr; for (uint64_t value : attr.getValues<uint64_t>()) ...; for (APInt value : attr.getValues<APInt>()) ...; for (IntegerAttr value : attr.getValues<IntegerAttr>()) ...; ``` ElementsAttr also supports failable range/iterator access, allowing for selective code paths depending on data type support: ```c++ ElementsAttr attr = ...; if (auto range = attr.tryGetValues<uint64_t>()) { for (uint64_t value : *range) ...; } ``` Differential Revision: https://reviews.llvm.org/D109190
-
Usman Nadeem authored
Differential Revision: https://reviews.llvm.org/D109808 Change-Id: I1a10d2bc33acbe0ea353c6cb3d077851391fe73e
-
Amara Emerson authored
For x86 Darwin, we have a stack checking feature which re-uses some of this machinery around stack probing on Windows. Renaming this to be more appropriate for a generic feature. Differential Revision: https://reviews.llvm.org/D109993
-
- Sep 20, 2021
-
-
Jacob Lambert authored
[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.
-
Amara Emerson authored
This attribute calls a function instead of emitting a trap instruction. Differential Revision: https://reviews.llvm.org/D110098
-
Florian Mayer authored
Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D110067
-
Craig Topper authored
These are cases were the splat is in another basic block. CGP needs to sink it to expose the opportunity to SelectionDAG.
-
Paul Robinson authored
-
Nico Weber authored
This reverts commit 6d7b3d6b. Breaks running cmake with `-DCLANG_ENABLE_STATIC_ANALYZER=OFF` without turning off CLANG_TIDY_ENABLE_STATIC_ANALYZER. See comments on https://reviews.llvm.org/D109611 for details.
-
Nico Weber authored
See discussion on https://reviews.llvm.org/D110016 for details.
-
Craig Topper authored
If either of the multiplicands is a splat, we can sink it to use vfmacc.vf or similar.
-
Craig Topper authored
This is another case of a splat being in another basic block preventing SelectionDAG from optimizing it.
-
Arthur Eubanks authored
-Wl,-z,defs doesn't work with sanitizers. See https://clang.llvm.org/docs/AddressSanitizer.html Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D110086
-