- Oct 24, 2020
-
-
Arthur Eubanks authored
Fixes noalias-calls.ll under NPM. Differential Revision: https://reviews.llvm.org/D89592
-
- Oct 23, 2020
-
-
Evandro Menezes authored
Use the commercial name for the scheduling model for the SiFive 7 Series.
-
Cameron McInally authored
Differential Revision: https://reviews.llvm.org/D89162
-
Artur Pilipenko authored
This change introduces a GC parseable lowering for element atomic memcpy/memmove intrinsics. This way runtime can provide an implementation which can take a safepoint during copy operation. See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev for the background and details: https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ Differential Revision: https://reviews.llvm.org/D88861
-
Geoffrey Martin-Noble authored
This unbreaks building with `LLVM_ENABLE_THREADS=0`. Since https://github.com/llvm/llvm-project/commit/069919c9ba33 usage of `std::promise` is not guarded by `LLVM_ENABLE_THREADS`, so this header must be unconditionally included. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D89758
-
Nick Desaulniers authored
It's currently ambiguous in IR whether the source language explicitly did not want a stack a stack protector (in C, via function attribute no_stack_protector) or doesn't care for any given function. It's common for code that manipulates the stack via inline assembly or that has to set up its own stack canary (such as the Linux kernel) would like to avoid stack protectors in certain functions. In this case, we've been bitten by numerous bugs where a callee with a stack protector is inlined into an __attribute__((__no_stack_protector__)) caller, which generally breaks the caller's assumptions about not having a stack protector. LTO exacerbates the issue. While developers can avoid this by putting all no_stack_protector functions in one translation unit together and compiling those with -fno-stack-protector, it's generally not very ergonomic or as ergonomic as a function attribute, and still doesn't work for LTO. See also: https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/ https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u Typically, when inlining a callee into a caller, the caller will be upgraded in its level of stack protection (see adjustCallerSSPLevel()). By adding an explicit attribute in the IR when the function attribute is used in the source language, we can now identify such cases and prevent inlining. Block inlining when the callee and caller differ in the case that one contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`. Fixes pr/47479. Reviewed By: void Differential Revision: https://reviews.llvm.org/D87956
-
Stanislav Mekhanoshin authored
This does not change anything at the moment, but needed for D89170. In that change I am probing a physical SGPR to see if it is legal. RC is SReg_32, but DRC for scratch instructions is SReg_32_XEXEC_HI and test fails. That is sufficient just to check if DRC contains a register here in case of physreg. Physregs also do not use subregs so the subreg handling below is irrelevant for these. Differential Revision: https://reviews.llvm.org/D90064
-
Mircea Trofin authored
This was initiated from the uses of MCRegUnitIterator, so while likely not exhaustive, it's a step forward. Differential Revision: https://reviews.llvm.org/D89975
-
Baptiste Saleil authored
This patch adds support for MMA intrinsics. Authored by: Baptiste Saleil Reviewed By: #powerpc, bsaleil, amyk Differential Revision: https://reviews.llvm.org/D89345
-
Nikita Popov authored
I'm not sure whether this can cause actual non-determinism in the compiler output, but at least it causes non-determinism in the statistics collected by BasicAA. Use SetVector to have a predictable iteration order.
-
Amara Emerson authored
There are two optimizations here: 1. Consider the following code: FCMPSrr %0, %1, implicit-def $nzcv %sel1:gpr32 = CSELWr %_, %_, 12, implicit $nzcv %sub:gpr32 = SUBSWrr %_, %_, implicit-def $nzcv FCMPSrr %0, %1, implicit-def $nzcv %sel2:gpr32 = CSELWr %_, %_, 12, implicit $nzcv This kind of code where we have 2 FCMPs each feeding a CSEL can happen when we have a single IR fcmp being used by two selects. During selection, to ensure that there can be no clobbering of nzcv between the fcmp and the csel, we have to generate an fcmp immediately before each csel is selected. However, often we can essentially CSE these together later in MachineCSE. This doesn't work though if there are unrelated flag-setting instructions in between the two FCMPs. In this case, the SUBS defines NZCV but it doesn't have any users, being overwritten by the second FCMP. Our solution here is to try to convert flag setting operations between a interval of identical FCMPs, so that CSE will be able to eliminate one. 2. SelectionDAG imported patterns for arithmetic ops currently select the flag-setting ops for CSE reasons, and add the implicit-def $nzcv operand to those instructions. However if those impdef operands are not marked as dead, the peephole optimizations are not able to optimize them into non-flag setting variants. The optimization here is to find these dead imp-defs and mark them as such. This pass is only enabled when optimizations are enabled. Differential Revision: https://reviews.llvm.org/D89415
-
Arthur Eubanks authored
This reverts commit 3024fe5b. Causes major compile time regressions: https://llvm-compile-time-tracker.com/compare.php?from=3b8d8954bf2c192502d757019b9fe434864068e9&to=3024fe5b55ed72633915f613bd5e2826583c396f&stat=instructions
-
Lang Hames authored
This re-applies e2fceec2 with fixes. Apparently we already *do* support relaxation for ELF, so we need to make sure the test case allocates a slab at a fixed address, and that the R_X86_64_REX_GOTPCRELX test references an external that is guaranteed to be out of range.
-
Huihui Zhang authored
Immediate must be in an integer range [0,255] for umin/umax instruction. Extend pattern matching helper SelectSVEArithImm() to take in value type bitwidth when checking immediate value is in range or not. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D89831
-
Victor Huang authored
In this patch, Predicates fix added for the following: * disable prefix-instrs will disable pcrelative-memops * set two predicates PairedVectorMemops and PrefixInstrs for PLXVP/PSTXVP definitions Differential Revision: https://reviews.llvm.org/D89727 Reviewed by: amyk, steven.zhang
-
vpykhtin authored
I was wrong in thinking that MRI.use_instructions return unique instructions and mislead Jay in his previous patch D64393. First loop counted more instructions than it was in reality and the second loop went beyond the basic block with that counter. I used Jay's previous code that relied on MRI.use_operands to constrain the number of instructions to check among. modifiesRegister is inlined to reduce the number of passes over instruction operands and added assert on BB end boundary. Differential Revision: https://reviews.llvm.org/D89386
-
Paulo Matos authored
Implementation of instructions table.get, table.set, table.grow, table.size, table.fill, table.copy. Missing instructions are table.init and elem.drop as they deal with element sections which are not yet implemented. Added more tests to tables.s Differential Revision: https://reviews.llvm.org/D89797
-
Jeremy Morse authored
Deciding where to place debugging instructions when normal instructions sink between blocks is difficult -- see PR44117. Dealing with this with instruction-referencing variable locations is simple: we just tolerate DBG_INSTR_REFs referring to values that haven't been computed yet. This patch adds support into InstrRefBasedLDV to record when a variable value appears in the middle of a block, and should have a DBG_VALUE added when it appears (a debug use before def). While described simply, this relies heavily on the value-propagation algorithm in InstrRefBasedLDV. The implementation doesn't attempt to verify the location of a value unless something non-trivial occurs to merge variable values in vlocJoin. This means that a variable with a value that has no location can retain it across all control flow (including loops). It's only when another debug instruction specifies a different variable value that we have to check, and find there's no location. This property means that if a machine value is defined in a block dominated by a DBG_INSTR_REF that refers to it, all the successor blocks can automatically find a location for that value (if it's not clobbered). Thus in a sense, InstrRefBasedLDV is already supporting and implementing use-before-defs. This patch allows us to specify a variable location in the block where it's defined. When loading live-in variable locations, TransferTracker currently discards those where it can't find a location for the variable value. However, we can tell from the machine value number whether the value is defined in this block. If it is, add it to a set of use-before-def records. Then, once the relevant instruction has been processed, emit a DBG_VALUE immediately after it. Differential Revision: https://reviews.llvm.org/D85775
-
Jay Foad authored
This follows on from D89558 which added the new intrinsic and D88955 which added similar combines for llvm.amdgcn.fmul.legacy. Differential Revision: https://reviews.llvm.org/D90028
-
Denis Antrushin authored
Downstream testing revealed some problems with this patch. Reverting while investigating. This reverts commit 2b96dceb.
-
Paul C. Anagnostopoulos authored
Differential Revision: https://reviews.llvm.org/D89814
-
Matt Arsenault authored
-
Matt Arsenault authored
This will be relaxed to insert a nop if the offset hits the bad value, so over estimate branch instruction sizes.
-
Jeremy Morse authored
Handle DBG_INSTR_REF instructions in LiveDebugValues, to determine and propagate variable locations. The logic is fairly straight forwards: Collect a map of debug-instruction-number to the machine value numbers generated in the first walk through the function. When building the variable value transfer function and we see a DBG_INSTR_REF, look up the instruction it refers to, and pick the machine value number it generates, That's it; the rest of LiveDebugValues continues as normal. Awkwardly, there are two kinds of instruction numbering happening here: the offset into the block (which is how machine value numbers are determined), and the numbers that we label instructions with when generating DBG_INSTR_REFs. I've also restructured the TransferTracker redefVar code a little, to separate some DBG_VALUE specific operations into its own method. The changes around redefVar should be largely NFC, while allowing DBG_INSTR_REFs to specify a value number rather than just a location. Differential Revision: https://reviews.llvm.org/D85771
-
Chen Zheng authored
Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D89665
-
Sanjay Patel authored
As discussed in D89952, instcombine can sometimes find a way to reduce similar patterns, but it is incomplete. InstSimplify uses the computeConstantRange() ValueTracking analysis via simplifyICmpWithConstant(), so we just need to fill in the max value of cttz to process any "icmp pred cttz(X), C" pattern (the min value is initialized to zero automatically). https://alive2.llvm.org/ce/z/Z_SLWZ Follow-up to D89976.
-
Sanjay Patel authored
As discussed in D89952, instcombine can sometimes find a way to reduce similar patterns, but it is incomplete. InstSimplify uses the computeConstantRange() ValueTracking analysis via simplifyICmpWithConstant(), so we just need to fill in the max value of ctlz to process any "icmp pred ctlz(X), C" pattern (the min value is initialized to zero automatically). Follow-up to D89976.
-
Sanjay Patel authored
As discussed in D89952, instcombine can sometimes find a way to reduce similar patterns, but it is incomplete. InstSimplify uses the computeConstantRange() ValueTracking analysis via simplifyICmpWithConstant(), so we just need to fill in the max value of ctpop to process any "icmp pred ctpop(X), C" pattern (the min value is initialized to zero automatically). Differential Revision: https://reviews.llvm.org/D89976
-
Simon Pilgrim authored
matchBSwapOrBitReverse was hardcoded to just match bswaps - we're going to need to expose the ability to match bitreverse as well, so make this part of the function call.
-
Simon Pilgrim authored
This matches bswap and bitreverse intrinsics, so we should make that clear in the function name.
-
Simon Pilgrim authored
-
Evgeny Leviant authored
Differential revision: https://reviews.llvm.org/D90017
-
Florian Hahn authored
This patch adds a specialized implementation of getIntrinsicInstrCost and add initial cost-modeling for min/max vector intrinsics. AArch64 NEON support umin/smin/umax/smax for vectors <8 x i8>, <16 x i8>, <4 x i16>, <8 x i16>, <2 x i32> and <4 x i32>. Notably, it does not support vectors with i64 elements. This change by itself should have very little impact on codegen, but in follow-up patches I plan to teach the vectorizers to consider using those intrinsics on platforms where it is profitable, e.g. because there is no general 'select'-like instruction. The current cost returned should be better for throughput, latency and size. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D89953
-
Jeremy Morse authored
This patch adjusts _when_ something happens in LiveDebugValues / InstrRefBasedLDV, to make it more amenable to dealing with DBG_INSTR_REF instructions. There's no functional change. In the current InstrRefBasedLDV implementation, we collect the machine value-number transfer function for blocks at the same time as the variable-value transfer function. After solving machine value numbers, the variable-value transfer function is updated so that DBG_VALUEs of live-in registers have the correct value. The same would need to be done for DBG_INSTR_REFs, to connect instruction-references with machine value numbers. Rather than writing more code for that, this patch separates the two: we collect the (machine-value-number) transfer function and solve for machine value numbers, then step through the MachineInstrs again collecting the variable value transfer function. This simplifies things for the new few patches. Differential Revision: https://reviews.llvm.org/D85760
-
OCHyams authored
This patch copies @vsk's fix to instcombine from D85555 over to mem2reg. The motivation and rationale are exactly the same: When mem2reg removes an alloca, it erases the dbg.{addr,declare} instructions which refer to the alloca. It would be better to instead remove all debug intrinsics which describe the contents of the dead alloca, namely all dbg.value(<dead alloca>, ..., DW_OP_deref)'s. As far as I can tell, prior to D80264 these `dbg.value+deref`s would have been silently dropped instead of being made `undef`, so we're just returning to previous behaviour with these patches. Testing: `llvm-lit llvm/test` and `ninja check-clang` gave no unexpected failures. Added 3 tests, each of which covers a dbg.value deletion path in mem2reg: mem2reg-promote-alloca-1.ll mem2reg-promote-alloca-2.ll mem2reg-promote-alloca-3.ll The first is based on the dexter test inlining.c from D89543. This patch also improves the debugging experience for loop.c from D89543, which suffers similarly after arg promotion instead of inlining.
-
Jay Foad authored
Differential Revision: https://reviews.llvm.org/D88955
-
Caroline Concatto authored
Use isKnownXY comparators when one of the operands can be with scalable vectors or getFixedSize() for all the other cases. This patch also does bug fixes for getPrimitiveSizeInBits by using getFixedSize() near the places with the TypeSize comparison. Differential Revision: https://reviews.llvm.org/D89703
-
Evgeny Leviant authored
Differential revision: https://reviews.llvm.org/D89957
-
Lang Hames authored
This reverts commit e2fceec2. This commit broke one of the bots. Reverting while I investigate.
-
Lang Hames authored
No support for relaxation yet -- this will always use the GOT entry.
-