- Jul 14, 2021
-
-
Louis Dionne authored
This commit reverts 5099e015 and 77396bbc, which broke the build in various ways. I'm reverting until I can investigate, since that change appears to be way more subtle than it seemed.
-
Roman Lebedev authored
-
Nikita Popov authored
While it is nice to have separate methods in the public AttributeSet API, we can fetch the type from the internal AttributeSetNode using a generic API for all type attribute kinds.
-
Roman Lebedev authored
-
David Green authored
For i64 reductions we currently try and convert add(VMLALV(X, Y), B) to VMLALVA(B, X, Y), incorporating the addition into the VMLALVA. If we have an add of an existing VMLALVA, this patch pushes the add up above the VMLALVA so that it may potentially be simplified further, for example being folded into another VMLALV. Differential Revision: https://reviews.llvm.org/D105686
-
Vitaly Buka authored
Differential Revision: https://reviews.llvm.org/D105954
-
Mehdi Amini authored
It was an alias for a long time.
-
Nikita Popov authored
A couple of attributes had explicit checks for incompatibility with pointer types. However, this is already handled generically by the typeIncompatible() check. We can drop these after adding SwiftError to typeIncompatible(). However, the previous implementation of the check prints out all attributes that are incompatible with a given type, even though those attributes aren't actually used. This has the annoying result that the error message changes every time a new attribute is added to the list. Improve this by explicitly finding which attribute isn't compatible and printing just that.
-
Saleem Abdulrasool authored
The emission was corrected for the swift_async calling convention but the demangling support was not. This repairs the demangling support as well.
-
Eli Friedman authored
This is mostly a minor convenience, but the pattern seems frequent enough to be worthwhile (and we'll probably add more uses in the future). Differential Revision: https://reviews.llvm.org/D105850
-
Thomas Lively authored
Replace the experimental clang builtin and LLVM intrinsics for these instructions with normal codegen patterns. Resolves PR50433. Differential Revision: https://reviews.llvm.org/D105950
-
Louis Dionne authored
-
Thomas Lively authored
The data layout strings do not have any effect on llc tests and will become misleadingly out of date as we continue to update the canonical data layout, so remove them from the tests. Differential Revision: https://reviews.llvm.org/D105842
-
Fangrui Song authored
The ELF specification says "The link editor honors the common definition and ignores the weak ones." GNU ld and our Symbol::compare follow this, but the --fortran-common code (D86142) made a mistake on the precedence. Fixes https://bugs.llvm.org/show_bug.cgi?id=51082 Reviewed By: peter.smith, sfertile Differential Revision: https://reviews.llvm.org/D105945
-
David Green authored
MVE does not have a VMLALV instruction that can perform v16i8 -> i64 reductions, like it does for v8i16->i64 and v4i32->i64 reductions. That means that the pattern to create them will be spilt up by type legalization, creating a lot of instructions. This extends the patterns for matching i64 reductions a little to handle the v16i8->i64 case. We need to turn them into a pair of v8i16->i64 VMLALVs that each perform half of the reduction and are summed together (so the later is a VMLALVA). The order of the lanes does not matter for the reduction so we generate a MVEEXT for the extension, that will either be folded into a extending load or can be optimized to a VREV/VMOVL. Some of the resulting codegen isn't optimal, but will be improved in a later patch. Differential Revision: https://reviews.llvm.org/D105680
-
Sanjay Patel authored
This set of folds was added recently with: c7b658ae 0c400e89 40b752d2 ...and I noted that this wasn't likely to fire in code derived from C/C++ source because of nsw in particular. But I didn't notice that I had placed the code above the no-wrap block of transforms. This is likely the cause of regressions noted from the previous commit because -- as shown in the test diffs -- we may have transformed into a compare with an arbitrary constant rather than a simpler signbit test.
-
Sanjay Patel authored
-
Sander de Smalen authored
This patch emits remarks for instructions that have invalid costs for a given set of vectorization factors. Some example output: t.c:4:19: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load dst[i] = sinf(src[i]); ^ t.c:4:14: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2, vscale x 4): call to llvm.sin.f32 dst[i] = sinf(src[i]); ^ t.c:4:12: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store dst[i] = sinf(src[i]); ^ Reviewed By: fhahn, kmclaughlin Differential Revision: https://reviews.llvm.org/D105806
-
Matt Arsenault authored
-
Sander de Smalen authored
At the moment, <vscale x 1 x eltty> are not yet fully handled by the code-generator, so to avoid vectorizing loops with that VF, we mark the cost for these types as invalid. The reason for not adding a new "TTI::getMinimumScalableVF" is because the type is supposed to be a type that can be legalized. It partially is, although the support for these types need some more work. Reviewed By: paulwalker-arm, dmgreen Differential Revision: https://reviews.llvm.org/D103882
-
Aaron Ballman authored
The anonymous and non-anonymous bit-field diagnostics are easily combined into one diagnostic. However, the diagnostic was missing a "the" that is present in the almost-identically worded warn_bitfield_width_exceeds_type_width diagnostic, hence the changes to test cases.
-
Jay Foad authored
This prevents breaking the indentation that shows the structure of the pass managers. Differential Revision: https://reviews.llvm.org/D105891
-
Alexey Bataev authored
The cost of the InsertSubvector shuffle kind cost is not complete and may end up with just extracts + inserts costs in many cases. Added a workaround to represent it as a generic PermuteSingleSrc, which is still pessimistic but better than InsertSubvector. Differential Revision: https://reviews.llvm.org/D105827
-
Louis Dionne authored
We might as well use the various XXX_TARGET_TRIPLE variables directly.
-
Yitzhak Mandelbaum authored
When the end loc of the specified range is a split token, `makeFileCharRange` does not process it correctly. This patch adds proper support for split tokens. Differential Revision: https://reviews.llvm.org/D105365
-
Peixin Qiao authored
The following semantic check is removed in OpenMP Version 5.0: ``` Taskloop simd construct restrictions: No reduction clause can be specified. ``` Also fix several typos. Reviewed By: kiranchandramohan Differential Revision: https://reviews.llvm.org/D105874
-
Jinsong Ji authored
$ is used as PC for PowerPC inlineasm, ELF use it, enable it for AIX XCOFF as well. Reviewed By: #powerpc, amyk, nemanjai Differential Revision: https://reviews.llvm.org/D105956
-
Matthias Springer authored
Differential Revision: https://reviews.llvm.org/D105607
-
oToToT authored
The CMake community Wiki has been moved to the [[ https://gitlab.kitware.com/cmake/community/wikis/home | Kitware GitLab Instance ]]. Also, the original anchor for `Information how to set up various cross compiling toolchains` section might not work as expected. The original content is now being collapsed, so browser won't navigate to the right section directly. Hence, I think it might be better to provide the section name instead of `this section` with link to help readers find the right section by themselves. Reviewed By: void Differential Revision: https://reviews.llvm.org/D104996
-
Tim Northover authored
If we try to create a new GlobalVariable on each iteration, the Module will detect the name collision and "helpfully" rename later iterations by appending ".1" etc. But "___udivsi3.1" doesn't exist and we definitely don't want to try to call it. So instead check whether there's already a global with the right name in the module and use that if so.
-
Sanjay Patel authored
This has been a work-in-progress for a long time...we finally have all of the pieces in place to handle vectorization of compare code as shown in: https://llvm.org/PR41312 To do this (see PhaseOrdering tests), we converted SimplifyCFG and InstCombine to the poison-safe (select) forms of the logic ops, so now we need to have SLP recognize those patterns and insert a freeze op to make a safe reduction: https://alive2.llvm.org/ce/z/NH54Ah We get the minimal patterns with this patch, but the PhaseOrdering tests show that we still need adjustments to get the ideal IR in some or all of the motivating cases. Differential Revision: https://reviews.llvm.org/D105730
-
Gabor Marton authored
This proved to be very useful during debugging. Differential Revision: https://reviews.llvm.org/D103967
-
Alexander Shaposhnikov authored
Make use of ArgList::getLastArgValue. NFC. Test plan: make check-lld-macho Differential revision: https://reviews.llvm.org/D105452
-
Djordje Todorovic authored
This new MIR pass removes redundant DBG_VALUEs. After the register allocator is done, more precisely, after the Virtual Register Rewriter, we end up having duplicated DBG_VALUEs, since some virtual registers are being rewritten into the same physical register as some of existing DBG_VALUEs. Each DBG_VALUE should indicate (at least before the LiveDebugValues) variables assignment, but it is being clobbered for function parameters during the SelectionDAG since it generates new DBG_VALUEs after COPY instructions, even though the parameter has no assignment. For example, if we had a DBG_VALUE $regX as an entry debug value representing the parameter, and a COPY and after the COPY, DBG_VALUE $virt_reg, and after the virtregrewrite the $virt_reg gets rewritten into $regX, we'd end up having redundant DBG_VALUE. This breaks the definition of the DBG_VALUE since some analysis passes might be built on top of that premise..., and this patch tries to fix the MIR with the respect to that. This first patch performs bacward scan, by trying to detect a sequence of consecutive DBG_VALUEs, and to remove all DBG_VALUEs describing one variable but the last one: For example: (1) DBG_VALUE $edi, !"var1", ... (2) DBG_VALUE $esi, !"var2", ... (3) DBG_VALUE $edi, !"var1", ... ... in this case, we can remove (1). By combining the forward scan that will be introduced in the next patch (from this stack), by inspecting the statistics, the RemoveRedundantDebugValues removes 15032 instructions by using gdb-7.11 as a testbed. Differential Revision: https://reviews.llvm.org/D105279
-
Simon Pilgrim authored
[InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183) (REAPPLIED) As discussed on PR50183, we already fold to prefer 'select-of-idx' vs 'select-of-gep': define <4 x i32>* @select0a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) { %gep0 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1 %gep1 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a3 %sel = select i1 %a2, <4 x i32>* %gep0, <4 x i32>* %gep1 ret <4 x i32>* %sel } --> define <4 x i32>* @select1a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) { %sel = select i1 %a2, i64 %a1, i64 %a3 %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel ret <4 x i32>* %gep } This patch adds basic handling for the 'fallthrough' cases where the gep idx == 0 has been folded away to the base address: define <4 x i32>* @select0(<4 x i32>* %a0, i64 %a1, i1 %a2) { %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1 %sel = select i1 %a2, <4 x i32>* %a0, <4 x i32>* %gep ret <4 x i32>* %sel } --> define <4 x i32>* @select1(<4 x i32>* %a0, i64 %a1, i1 %a2) { %sel = select i1 %a2, i64 0, i64 %a1 %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel ret <4 x i32>* %gep } Reapplied with a fix for the bpf "-bpf-disable-avoid-speculation" tests Differential Revision: https://reviews.llvm.org/D105901
-
Chuanqi Xu authored
-
Bruce Mitchener authored
Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D103744
-
Simon Pilgrim authored
[X86] Implement smarter instruction lowering for FP_TO_UINT from f32/f64 to i32/i64 and vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction. We know that "CVTTPS2SI" returns 0x80000000 for out of range inputs (and for FP_TO_UINT, negative float values are undefined). We can use this to make unsigned conversions from vXf32 to vXi32 more efficient, particularly on targets without blend using the following logic: small := CVTTPS2SI(x); fp_to_ui(x) := small | (CVTTPS2SI(x - 2^31) & ARITHMETIC_RIGHT_SHIFT(small, 31)) Even on targets where "PBLENDVPS"/"PBLENDVB" exists, it is often a latency 2, low throughput instruction so this logic is applied there too (in particular for AVX2 also). It furthermore gets rid of one high latency floating point comparison in the previous lowering. @TomHender checked the correctness of this for all possible floats between -1 and 2^32 (both ends excluded). Original Patch by @TomHender (Tom Hender) Differential Revision: https://reviews.llvm.org/D89697
-
LLVM GN Syncbot authored
-
Simon Pilgrim authored
Revert rGb803294cf78714303db2d3647291a2308347ef23 : "[InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183)" Missed some BPF test changes that need addressing
-