- Oct 14, 2021
-
-
Brian Cain authored
This commit adds the system reg/regpair definitions and the corresponding register transfer instructions.
-
Andrew Savonichev authored
-
Andrew Savonichev authored
These registers are used as operands for instructions that expect an integer register, so they should be added to Int32Regs or Int64Regs register classes. Otherwise the machine verifier emits an error for the following LIT tests when LLVM_ENABLE_MACHINE_VERIFIER=1 environment variable is set: *** Bad machine code: Illegal physical register for instruction *** - function: kernel_func - basic block: %bb.0 entry (0x55c8903d5438) - instruction: %3:int64regs = LEA_ADDRi64 $vrframelocal, 0 - operand 1: $vrframelocal $vrframelocal is not a Int64Regs register. CodeGen/NVPTX/call-with-alloca-buffer.ll CodeGen/NVPTX/disable-opt.ll CodeGen/NVPTX/lower-alloca.ll CodeGen/NVPTX/lower-args.ll CodeGen/NVPTX/param-align.ll CodeGen/NVPTX/reg-types.ll DebugInfo/NVPTX/dbg-declare-alloca.ll DebugInfo/NVPTX/dbg-value-const-byref.ll Differential Revision: https://reviews.llvm.org/D110164
-
Florian Hahn authored
Running -vector-combine early can introduce new vector operations, blocking loop/SLP vectorization. The added test case could be better optimized by the SLPVectorizer if no new vector operations are added early.
-
Jonas Paulsson authored
-
Andrew Savonichev authored
The patch attempts to optimize a sequence of SIMD loads from the same base pointer: %0 = gep float*, float* base, i32 4 %1 = bitcast float* %0 to <4 x float>* %2 = load <4 x float>, <4 x float>* %1 ... %n1 = gep float*, float* base, i32 N %n2 = bitcast float* %n1 to <4 x float>* %n3 = load <4 x float>, <4 x float>* %n2 For AArch64 the compiler generates a sequence of LDR Qt, [Xn, #16]. However, 32-bit NEON VLD1/VST1 lack the [Wn, #imm] addressing mode, so the address is computed before every ld/st instruction: add r2, r0, #32 add r0, r0, #16 vld1.32 {d18, d19}, [r2] vld1.32 {d22, d23}, [r0] This can be improved by computing address for the first load, and then using a post-indexed form of VLD1/VST1 to load the rest: add r0, r0, #16 vld1.32 {d18, d19}, [r0]! vld1.32 {d22, d23}, [r0] In order to do that, the patch adds more patterns to DAGCombine: - (load (add ptr inc1)) and (add ptr inc2) are now folded if inc1 and inc2 are constants. - (or ptr inc) is now recognized as a pointer increment if ptr is sufficiently aligned. In addition to that, we now search for all possible base updates and then pick the best one. Differential Revision: https://reviews.llvm.org/D108988
-
Simon Pilgrim authored
Avoids unused assignment scan-build warning.
-
Simon Pilgrim authored
Without SSE41 sext/zext instructions the extensions will be split, meaning that the MUL->PMADDWD fold will split the sext_i32(x) into zext_i32(sext_i16(x))
-
Simon Pilgrim authored
2 returns, one after the other - reported by coverity
-
Jeremy Morse authored
Some functions get opted out of instruction referencing if they're being compiled with no optimisations, however the LiveDebugValues pass picks one implementation and then sticks with it through the rest of compilation. This leads to a segfault if we encounter a function that doesn't use instr-ref (because it's optnone, for example), but we've already decided to use InstrRefBasedLDV which expects to be passed a DomTree. Solution: keep both implementations around in the pass, and pick whichever one is appropriate to the current function.
-
Jonas Paulsson authored
This reverts 3562076d and includes some refactoring as well. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D111733
-
Jonas Paulsson authored
This patch fixes the bug that consisted of treating variable / immediate length mem operations (such as memcpy, memset, ...) differently. The variable length case needs to have the length minus 1 passed due to the use of EXRL target instructions. However, the DAGCombiner can convert a register length argument into a constant one, and whenever that happened one byte too little would end up being performed. This is also a refactorization by reducing the number of opcodes and variants involved. For any opcode (variable or constant length), only the length minus one is passed on to the ISD node. The rest of the logic is now instead handled during isel pseudo expansion. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D111729
-
Max Kazantsev authored
Replace check with if ((ExitIfTrue && CI->isZero()) || (!ExitIfTrue && CI->isOne())) with equivalent and simpler version if (ExitIfTrue == CI->isZero())
-
Max Kazantsev authored
Check lightweight getter condition before calling all_of.
-
Arthur Eubanks authored
gcc does not support __has_feature(), so this was accidentally changed in D111581 when compiling with gcc.
-
Ben Shi authored
Opitimize immediate materialisation in the following way if profitable: 1. Use BCLRI for upper 32 bits if the lower 32 bits are negative int32. 2. Use BSETI for upper 32 bits if the lower 32 bits are positive int32. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D111508
-
Abinav Puthan Purayil authored
The 24-bit mul intrinsics yields the low-order 32 bits. We should only do the transformation if the operands are known to be not wider than 24 bits and the result is known to be not wider than 32 bits. Differential Revision: https://reviews.llvm.org/D111523
-
Tom Stellard authored
Reviewed By: smeenai Differential Revision: https://reviews.llvm.org/D110976
-
Ben Shi authored
Use LUI+SLLI.UW to compose the upper bits instead of LUI+SLLI. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D111705
-
Ben Shi authored
Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D111704
-
Lang Hames authored
-
wlei authored
The first LBR entry can be an external branch, we should ignore the whole trace. ``` 7f7448e889e4 0x7f7448e889e4/0x7f7448e88826/P/-/-/1 0x7f7448e8899f/0x7f7448e889d8/P/-/-/4 ... ``` Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D111749
-
wlei authored
With `ignore-stack-samples`, We can ignore the call stack before the samples aggregation which could reduce some redundant computations. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D111577
-
Lang Hames authored
SimpleRemoteEPC notionally allowed subclasses to override the createMemoryManager and createMemoryAccess methods to use custom objects, but could not actually be subclassed in practice (The construction process in SimpleRemoteEPC::Create could not be re-used). Instead of subclassing, this commit adds a SimpleRemoteEPC::Setup class that can be used by clients to set up the memory manager and memory access members. A default-constructed Setup object results in no change from previous behavior (EPCGeneric* memory manager and memory access objects used by default).
-
Lang Hames authored
-
Nico Weber authored
-
Shoaib Meenai authored
If the parameter had been annotated as nonnull because of the null check, we want to remove the attribute, since it may no longer apply and could result in miscompiles if left. Similarly, we also want to remove undef-implying attributes, since they may not apply anymore either. Fixes PR52110. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D111515
-
- Oct 13, 2021
-
-
Mircea Trofin authored
The tests that exercise the 'release' mode, where the model is AOT-ed, check the output has certain properties, to validate that, indeed, a different policy from the default one was exercised. For determinism, we can't reliably check that output for an arbitrary learned policy, since it could be that policy happens to mimic the default one in that particular case. This patch adds a requirement that those tests run only when the model is autogenerated (e.g. on build bots). Differential Revision: https://reviews.llvm.org/D111747
-
Philip Reames authored
This extends the foldOpIntoPhi code used when visiting a freeze user of a phi to allow any non-undef/poison operand as opposed to only non-undef/poison constants. This lets us hoist a freeze in the increment of an IV into the preheader in many cases. Differential Revision: https://reviews.llvm.org/D111744
-
Martin Storsjö authored
They were in the doxygen group Observers, while they are about mutating paths. Differential Revision: https://reviews.llvm.org/D111732
-
Martin Storsjö authored
After 8fc7a907, this loop does the same as a plain `std::replace`. Also clarify the comment about what this function does. Differential Revision: https://reviews.llvm.org/D111730
-
Roman Lebedev authored
-
Roman Lebedev authored
-
Roman Lebedev authored
`X86TTIImpl::getGSScalarCost()` has (at least) two issues: * it naively computes the cost of sequence of `insertelement`/`extractelement`. If we are operating not on the XMM (but YMM/ZMM), this widely overestimates the cost of subvector insertions/extractions. * Gather/scatter takes a vector of pointers, and scalarization results in us performing scalar memory operation for each of these pointers, but we never account for the cost of extracting these pointers out of the vector of pointers. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111222
-
Sjoerd Meijer authored
Even if there are no interesting functions, the SCCP solver would still run before bailing. Now bail earlier, avoid running the solver for nothing. Differential Revision: https://reviews.llvm.org/D111645
-
Noah Shutty authored
This finds the curl libraries if LLVM_ENABLE_CURL is set. This is needed to implement the debuginfod client library in LLVM. Patch By: noajshu Differential Revision: https://reviews.llvm.org/D111238
-
Arthur Eubanks authored
Following D110451, we need to make sure to support 64 bit values.
-
Lang Hames authored
This should fix compile errors in llvm-jitlink.cpp in LLVM_ENABLE_THREADS=Off builds due to f3411616.
-
Joe Nash authored
NFC. This check does not verify any functional property since size 8 was added. Remove it for simplicity. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D111737 Change-Id: Ifd7cbd324a137f939d8dc04acb8fbd54c9527a42
-
Kai Nacke authored
This PR implements the save of the XPLINK callee-saved registers on z/OS. Reviewed By: uweigand, Kai Differential Revision: https://reviews.llvm.org/D111653
-