- Jul 13, 2021
-
-
Qiu Chaofan authored
Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D105789
-
Stanislav Mekhanoshin authored
This is a pilot change to verify the logic. The rest will be done in a same way, at least the rest of VOP1. Differential Revision: https://reviews.llvm.org/D105742
-
Qiu Chaofan authored
a22ecb45 fixed a crash on big endian subtargets. This commit fixes a typo in that commit which may cause miscompile.
-
hyeongyu kim authored
This patch fixes the problem of SimplifyBranchOnICmpChain that occurs when extra values are Undef or poison. Suppose the %mode is 51 and the %Cond is poison, and let's look at the case below. ``` %A = icmp ne i32 %mode, 0 %B = icmp ne i32 %mode, 51 %C = select i1 %A, i1 %B, i1 false %D = select i1 %C, i1 %Cond, i1 false br i1 %D, label %T, label %F => br i1 %Cond, label %switch.early.test, label %F switch.early.test: switch i32 %mode, label %T [ i32 51, label %F i32 0, label %F ] ``` incorrectness: https://alive2.llvm.org/ce/z/BWScX Code before transformation will not raise UB because %C and %D is false, and it will not use %Cond. But after transformation, %Cond is being used immediately, and it will raise UB. This problem can be solved by adding freeze instruction. correctness: https://alive2.llvm.org/ce/z/x9x4oY Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104569
-
David Green authored
Similar to D91921 (and D104515) this introduces two MVESEXT and MVEZEXT nodes that larger-than-legal sext and zext are lowered to. These either get optimized away or end up becoming a series of stack loads/store, in order to perform the extending whilst keeping the order of the lanes correct. They are generated from v8i16->v8i32, v16i8->v16i16 and v16i8->v16i32 extends, potentially with a intermediate extend for the larger v16i8->v16i32 extend. A number of combines have been added for obvious cases that come up in tests, notably MVEEXT of shuffles. More may be needed in the future, but this seems to cover most of the cases that come up in the tests. Differential Revision: https://reviews.llvm.org/D105090
-
hyeongyu kim authored
-
Vitaly Buka authored
Fails here https://lab.llvm.org/buildbot/#/builders/37/builds/5267 This reverts commit e4aa6ad1.
-
Jessica Paquette authored
Generalize the existing eq/ne case using `extractParts`. The original code only handled narrowings for types of width 2n->n. This generalization allows for any type that can be broken down by `extractParts`. General overview is: - Loop over each narrow-sized part and do exactly what the 2-register case did. - Loop over the leftover-sized parts and do the same thing - Widen the leftover-sized XOR results to the desired narrow size - OR that all together and then do the comparison against 0 (just like the old code) This shows up a lot when building clang for AArch64 using GlobalISel, so it's worth fixing. For the sake of simplicity, this doesn't handle the non-eq/ne case yet. Also remove the code in this case that notifies the observer; we're just going to delete MI anyway so talking to the observer shouldn't be necessary. Differential Revision: https://reviews.llvm.org/D105161
-
Arthur Eubanks authored
-
Arthur Eubanks authored
Reviewed By: #opaque-pointers, dblaikie Differential Revision: https://reviews.llvm.org/D105710
-
David Blaikie authored
This call would incorrectly overwrite (with the .debug_rnglists.dwo from the executable, if there was one) the rnglists section instead of the correct value (from the .debug_rnglists.dwo in the .dwo file) that's applied in DWARFUnit::tryExtractDIEsIfNeeded
-
Jon Roelofs authored
-
Eli Friedman authored
Saves one instruction for signed, uses a cheaper instruction for unsigned. Differential Revision: https://reviews.llvm.org/D105770
-
- Jul 12, 2021
-
-
Eli Friedman authored
Not really useful on its own, but D105673 depends on it. Differential Revision: https://reviews.llvm.org/D105840
-
Amy Kwan authored
[PowerPC] Fix the splat immediate in PPCMIPeephole depending on if we have an Altivec and VSX splat instruction. An assertion of the following can occur because Altivec and VSX splats use a different operand number for the immediate: ``` int64_t llvm::MachineOperand::getImm() const: Assertion `isImm() && "Wrong MachineOperand accessor"' failed. ``` This patch updates PPCMIPeephole.cpp assign the correct splat immediate. Differential Revision: https://reviews.llvm.org/D105790
-
Nikita Popov authored
Continuing from D105763, this allows placing certain properties about attributes in the TableGen definition. In particular, we store whether an attribute applies to fn/param/ret (or a combination thereof). This information is used by the Verifier, as well as the ForceFunctionAttrs pass. I also plan to use this in LLParser, which also duplicates info on which attributes are valid where. This keeps metadata about attributes in one place, and makes it more likely that it stays in sync, rather than in various functions spread across the codebase. Differential Revision: https://reviews.llvm.org/D105780
-
Wouter van Oortmerssen authored
-
Nikita Popov authored
InAlloca was listed twice, once as a normal attribute, once as a type attribute.
-
Nikita Popov authored
This is now the same as isIntAttrKind(), so use that instead, as it does not require manual maintenance. The naming is also more accurate in that both int and type attributes have an argument, but this method was only targeting int attributes. I initially wanted to tighten the AttrBuilder assertion, but we have some in-tree uses that would violate it.
-
Simon Pilgrim authored
Update (mainly) vXf32/vXf64 -> vXi8/vXi16 fptosi/fptoui costs based on the worst case costs from the script in D103695. Move to using legalized types wherever possible, which allows us to prune the cost tables.
-
Thomas Johnson authored
This change is a step towards implementing codegen for __builtin_clz(). Full support for CLZ with a regression test will follow shortly. Differential Revision: https://reviews.llvm.org/D105560
-
Nikita Popov authored
It's not necessary to explicitly sort by enum/int/type attribute, as the attribute kinds are already sorted this way. We can directly sort by kind.
-
Nikita Popov authored
Assert that enum/int/type attributes go through the constructor they are supposed to use. To make sure this can't happen via invalid bitcode, explicitly verify that the attribute kind if correct there.
-
Nikita Popov authored
Followup to D105658 to make AttrBuilder automatically work with new type attributes. TableGen is tweaked to emit First/LastTypeAttr markers, based on which we can handle type attributes programmatically. Differential Revision: https://reviews.llvm.org/D105763
-
Jinsong Ji authored
The lowering for v2i64 is now guarded with hasDirectMove, however, the current lowering can handle the pattern correctly, only lowering it when there is efficient patterns and corresponding instructions. The original guard was added in D21135, and was for Legal action. The code has evloved now, this guard is not necessary anymore. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D105596
-
Thomas Lively authored
Replace the clang builtin function and LLVM intrinsic for f32x4.demote_zero_f64x2 with combines from normal SDNodes. Also add missing combines for i32x4.trunc_sat_zero_f64x2_{s,u}, which share the same pattern. Differential Revision: https://reviews.llvm.org/D105755
-
Craig Topper authored
[X86] Teach X86FloatingPoint's handleCall to only erase the FP stack if there is a regmask operand that clobbers the FP stack. There are some calls to functions like `__alloca` that are missing a regmask operand. Lack of a regmask operand means that all registers that aren't mentioned by def operands are preserved. __alloca only updates EAX and ESP and has def operands for them so this is ok. Because there is no regmask the register allocator won't spill the FP registers across the call. Assuming we want to keep the FP stack untoched across these calls, we need to handle this is in the FP stackifier. We might want to add a proper regmask operand to the code that creates these calls to indicate all registers are preserved, but we'd still need this change to the FP stackifier to know to preserve the FP stack for such a regmask. The test is kind of long, but bugpoint wasn't able to reduce it any further. Fixes PR50782 Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D105762
-
Jinsong Ji authored
AIX .file directive support including compiler version string. https://www.ibm.com/docs/en/aix/7.2?topic=ops-file-pseudo-op This patch adds the support so that it will be easier to identify build compiler in objects. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D105743
-
Albion Fung authored
This patch implements trap and FP to and from double conversions. The builtins generate code that mirror what is generated from the XL compiler. Intrinsics are named conventionally with builtin_ppc, but are aliased to provide the same builtin names as the XL compiler. Differential Revision: https://reviews.llvm.org/D103668
-
Bradley Smith authored
Let other parts of legalization handle the rest of the node, this allows re-use of existing optimizations elsewhere. Differential Revision: https://reviews.llvm.org/D105624
-
Simon Tatham authored
No implementation uses the `LocCookie` parameter at all. Errors are reported from inside that function by `llvm::SourceMgr`, and the instance of that at the clang call site arranges to pass the error messages back to a `ClangAsmParserCallback`, which is where the clang SourceLocation for the error is computed. (This is part of a patch series working towards the ability to make SourceLocation into a 64-bit type to handle larger translation units. But this particular change seems beneficial in its own right.) Reviewed By: miyuki Differential Revision: https://reviews.llvm.org/D105490
-
Benjamin Kramer authored
AArch64ISelLowering.cpp:15167:8: warning: unused variable 'OpCode' [-Wunused-variable] auto OpCode = N->getOpcode(); ^
-
Cullen Rhodes authored
First patch in a series adding MC layer support for the Arm Scalable Matrix Extension. This patch adds the following features: sme, sme-i64, sme-f64 The sme-i64 and sme-f64 flags are for the optional I16I64 and F64F64 features. If a target supports I16I64 then the following instructions are implemented: * 64-bit integer ADDHA and ADDVA variants (D105570). * SMOPA, SMOPS, SUMOPA, SUMOPS, UMOPA, UMOPS, USMOPA, and USMOPS instructions that accumulate 16-bit integer outer products into 64-bit integer tiles. If a target supports F64F64 then the FMOPA and FMOPS instructions that accumulate double-precision floating-point outer products into double-precision tiles are implemented. Outer products are implemented in D105571. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D105569
-
Michael Liao authored
-
Jonas Paulsson authored
Don't use a local MachineOperand copy in SystemZAsmPrinter::PrintAsmOperand() and change the register as it may break the MRI tracking of register uses. Use an MCOperand instead. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D105757
-
Sanjay Patel authored
This is the pattern from the description of: https://llvm.org/PR50816 There might be a way to generalize this to a smaller or more generic pattern, but I have not found it yet. https://alive2.llvm.org/ce/z/ShzJoF define i1 @src(i8 %x) { %add = add i8 %x, -1 %xor = xor i8 %x, -1 %and = and i8 %add, %xor %r = icmp slt i8 %and, 0 ret i1 %r } define i1 @tgt(i8 %x) { %r = icmp eq i8 %x, 0 ret i1 %r }
-
Simon Pilgrim authored
Update truncation costs based on the worst case costs from the script in D103695. Move to using legalized types wherever possible, which allows us to prune the cost tables.
-
David Green authored
This sets the latency of stores to 1 in the Cortex-A55 scheduling model, to better match the values given in the software optimization guide. The latency of a store in normal llvm scheduling does not appear to have a lot of uses. If the store has no outputs then the latency is somewhat meaningless (and pre/post increment update operands use the WriteAdr write for those operands instead). The one place it does alter things is the latency between a store and the end of the scheduling region, which can in turn have an effect on the critical path length. As a result a latency of 1 is more correct and offers ever-so-slightly better scheduling of instructions near the end of the block. They are marked as RetireOOO to keep the llvm-mca from introducing stalls where non would exist. Differential Revision: https://reviews.llvm.org/D105541
-
Yevgeny Rouban authored
This new test demonstrates a case where a base ptr is generated twice for the same value: the first one is generated while the gc.get.pointer.base() is inlined, the second is generated for the statepoint. This happens because the methods inlineGetBaseAndOffset() and insertParsePoints() do not share their defining value cache used by the findBasePointer() method. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D103240
-
David Truby authored
This adds custom lowering for truncating stores when operating on fixed length vectors in SVE. It also includes a DAG combine to fold extends followed by truncating stores into non-truncating stores in order to prevent this pattern appearing once truncating stores are supported. Currently truncating stores are not used in certain cases where the size of the vector is larger than the target vector width. Differential Revision: https://reviews.llvm.org/D104471
-