- Mar 18, 2020
-
-
Florian Hahn authored
When the an underlying value is available, we can use its name for printing, as discussed in D73078. Reviewers: rengolin, hsaito, Ayal, gilr Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D76200
-
Simon Tatham authored
Summary: This is another set of instructions too complicated to be sensibly expressed in IR by anything short of a target-specific intrinsic. Given input vectors a,b, the instruction generates intermediate values 2*(a[0]*b[0]+a[1]+b[1]), 2*(a[2]*b[2]+a[3]+b[3]), etc; takes the high half of each double-width values, and overwrites half the lanes in the output vector c, which you therefore have to provide the input value of. Optionally you can swap the elements of b so that the are things like a[0]*b[1]+a[1]*b[0]; optionally you can round to nearest when taking the high half; and optionally you can take the difference rather than sum of the two products. Finally, saturation is applied when converting back to a single-width vector lane. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76359
-
Nico Weber authored
This reverts commit dd128268. Breaks tests on Windows, see https://reviews.llvm.org/D76346#1929208
-
Guillaume Chatelet authored
Summary: The patch is not ready yet and is here to discuss a few options: - How do we customize the implementation? (i.e. how to define `kRepMovsBSize`), - How do we specify custom compilation flags? (We'd need `-fno-builtin-memcpy` to be passed in), - How do we build? We may want to test in debug but build the libc with `-march=native` for instance, - Clang has a brand new builtin `__builtin_memcpy_inline` which makes the implementation easy and efficient, but: - If we compile with `gcc` or `msvc` we can't use it, resorting on less efficient code generation, - With gcc we can use `__builtin_memcpy` but then we'd need a postprocess step to check that the final assembly do not contain call to `memcpy` (unlikely but allowed), - For msvc we'd need to resort on the compiler optimization passes. Reviewers: sivachandra, abrachet Subscribers: mgorny, MaskRay, tschuett, libc-commits, courbet Tags: #libc-project Differential Revision: https://reviews.llvm.org/D74397
-
Nico Weber authored
-
Sam Parker authored
Run the update script on one of the loop unroll tests.
-
Matt Arsenault authored
This isn't really usable, and requires using the -amdgpu-fixed-function-abi flag to work. Assumes a uniform call target, and will hit a verifier error if the call target ends up in a VGPR. Also doesn't attempt to do anything sensible for the reported register/stack usage.
-
Matt Arsenault authored
This reverts commit 9bca8fc4. Rearrange handling to avoid changing the instruction in the case where it's going to be erased and replaced with undef.
-
Piotr Sobczak authored
Summary: For the case where "done" bits on existing exports are removed by unifyReturnBlockSet(), unify all return blocks - even the uniformly reached ones. We do not want to end up with a non-unified, uniformly reached block containing a normal export with the "done" bit cleared. That case is believed to be rare - possible with infinite loops in pixel shaders. This is a fix for D71192. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76364
-
Nico Weber authored
-
Simon Pilgrim authored
-
Nico Weber authored
This reverts commit 4060016f and re-merges c5b81466.
-
Marcel Hlopko authored
Summary: Copy of https://reviews.llvm.org/D72334, submitting with Ilya's permission. Handles template declaration of all kinds. Also builds template declaration nodes for specializations and explicit instantiations of classes. Some missing things will be addressed in the follow-up patches: specializations of functions and variables, template parameters. Reviewers: gribozavr2 Reviewed By: gribozavr2 Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76346
-
Sander de Smalen authored
Avoid transforming: %0 = bitcast i8* %base to <vscale x 16 x i8>* %1 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %0, i64 1 into: %0 = getelementptr i8, i8* %base, i64 16 %1 = bitcast i8* %0 to <vscale x 16 x i8>* Reviewers: efriedma, ctetreau Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D76236
-
Chris Bowler authored
This is the first of a series of patches that adds caller support for by-value arguments. This patch add support for arguments that are passed in a single GPR. There are 3 limitation cases: -The by-value argument is larger than a single register. -There are no remaining GPRs even though the by-value argument would otherwise fit in a single GPR. -The by-value argument requires alignment greater than register width. Future patches will be required to add support for these cases as well as for the callee handling (in LowerFormalArguments_AIX) that corresponds to this work. Differential Revision: https://reviews.llvm.org/D75863
-
Jan Kratochvil authored
D63643 added these testfiles but some of the %t4dwo and %t5dwo builds are the same as corresponding %t4 and %t5 builds. Fortunately the testcases do PASS. After just adding -gsplit-dwarf these both skeleton files: tools/lldb/test/SymbolFile/DWARF/Output/debug-types-expressions.test.tmp4dwo tools/lldb/test/SymbolFile/DWARF/Output/debug-types-expressions.test.tmp5dwo were referencing to this one non-skeleton file: tools/lldb/test/SymbolFile/DWARF/debug-types-expressions.dwo Surprisingly it does not affect the other test debug-types-basic.test probably because it compiles to .o and then links it. While debug-types-expressions.test compiles directly to an executable. So fixed that while keeping the direct executable compilation. Differential Revision: https://reviews.llvm.org/D76316
-
Simon Pilgrim authored
[InstCombine][X86] Add additional demandedelts style test for in-range variable per-element shift amounts (PR40391) If we've shuffled the shift amount some of the (undemanded) elements may have become undef - this should be handled by the missing support in PR36319.
-
Mehdi Amini authored
-
Mehdi Amini authored
The constructor of Expected<T> expects as T&&, but gcc-7.5 does not infer an rvalue in this context apparently.
-
Roman Lebedev authored
Summary: As noted in [[ https://bugs.llvm.org/show_bug.cgi?id=45201 | PR45201 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=10090 | PR10090 ]] SCEV doesn't always avoid recursive algorithms, and that causes issues with large expression depths and/or smaller stack sizes. In `SCEVExpander::isHighCostExpansion*()` case, the refactoring to avoid recursion is rather idiomatic. We simply need to place the root expr into a vector, and iterate over vector elements accounting for the cost of each one, adding new exprs at the end of the vector, thus achieving recursion-less traversal. The order in which we will visit exprs doesn't matter here, so we will be fine with the most basic approach of using SmallVector and inserting/extracting from the back, which accidentally is the same depth-first traversal that we were doing previously recursively. Reviewers: mkazantsev, reames, wmi, ekatz Reviewed By: mkazantsev Subscribers: hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76273
-
Oliver Stannard authored
When optimising for code size at the expense of performance, it is often worth saving and restoring some of r0-r3, if IPRA will be able to take advantage of them. This doesn't cost any extra code size if we already have a PUSH/POP pair, and increases the number of available registers across any calls to the function. We already have an optimisation which tries fold the subtract/add of the SP into the PUSH/POP by using extra registers, which somewhat conflicts with this. I've made the new optimisation less aggressive in cases where the existing one is likely to trigger, which gives better results than either of these optimisations by themselves. Differential revision: https://reviews.llvm.org/D69936
-
Guillaume Chatelet authored
Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76348
-
Kang Zhang authored
-
Danila Malyutin authored
-
Michael Liao authored
Summary: - https://reviews.llvm.org/D68578 revises the `GlobalDecl` constructors to ensure all GPU kernels have `ReferenceKenelKind` initialized properly with an explicit constructor and static one. But, there are lots of places using the implicit constructor triggering the assertion on non-GPU kernels. That's found in compilation of many tests and workloads. - Fixing all of them may change more code and, more importantly, all of them assumes the default kernel reference kind. This patch changes that constructor to tell `CUDAGlobalAttr` and construct `GlobalDecl` properly. Reviewers: yaxunl Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76344
-
Oliver Stannard authored
Rather than trying to work out which instructions are part of the epilogue by examining them, we can just mark them with the FrameDestroy flag, like we do in the AArch64 backend.
-
Kazuaki Ishizaki authored
Fix trivial typos Reviewers: mravishankar, antiagainst, ftynse Reviewed By: ftynse Subscribers: ftynse, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, bader, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76347
-
Med Ismail Bennani authored
This patch changes the way the StackFrame Recognizers match a certain frame. Until now, recognizers could be registered with a function name but also an alternate symbol. This change is motivated by a test failure for the Assert frame recognizer on Linux. Depending the version of the libc, the abort function (triggered by an assertion), could have more than two signatures (i.e. `raise`, `__GI_raise` and `gsignal`). Instead of only checking the default symbol name and the alternate one, lldb will iterate over a list of symbols to match against. rdar://60386577 Differential Revision: https://reviews.llvm.org/D76188 Signed-off-by:
Med Ismail Bennani <medismail.bennani@gmail.com>
-
Alexey Bataev authored
Implemented codegen for detach clause in task directives.
-
Francesco Petrogalli authored
Summary: This patch adds addressing mode computation for the following SVE instructions: * ldff1{s}<T1> { <Zt>.<T2> }, <Pg>/Z, [<Xn|SP>{, <Xm>{, lsl #imm}}] * ldnf1{s}<T1> { <Zt>.<T2> }, <Pg>/Z, [<Xn|SP>{, #<imm>, mul vl}] Reviewers: andwar, sdesmalen, rengolin, efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76209
-
Sander de Smalen authored
Summary: This fixes a discrepancy between the non-temporal loads/store intrinsics and other SVE load intrinsics (such as nf/ff), so that Clang can use the same code to generate these intrinsics. Reviewers: andwar, kmclaughlin, rengolin, efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76237
-
Danila Malyutin authored
Skip debug instructions before calling functions not expecting them. In particular, LIS.getInstructionIndex(*mi) would fail if mi was a debg instr. Differential Revision: https://reviews.llvm.org/D76129
-
David Stenberg authored
Summary: In D67768/D67492 I added support for entry values having blocks larger than one byte, but I now noticed that the DIE implementation I added there was broken. The takeNodes() function, that moves the entry value block from a temporary buffer to the output buffer, would destroy the input iterator when transferring the first node, meaning that only that node was moved. In practice, this meant that when emitting a call site value using a DW_OP_entry_value operation with a DWARF register number larger than 31, that multi-byte DW_OP_regx expression would be truncated. Reviewers: djtodoro, aprantl, vsk Reviewed By: djtodoro Subscribers: llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D76279
-
Florian Hahn authored
-
Simon Pilgrim authored
[InstCombine][X86] simplifyX86varShift - convert variable in-range per-element shift amounts to generic shifts (PR40391) AVX2/AVX512 per-element shifts can be replaced with generic shifts if the shift amounts are guaranteed to be in-range (upper bits are known zero).
-
Sander de Smalen authored
Reworked the patch to avoid sharing a header (SVETypeFlags.h) between include/clang/Basic and utils/TableGen/SveEmitter.cpp. Now the patch generates the enum/flags which is included in TargetBuiltins.h. Also renamed one of the SveEmitter options to be in line with MVE. Summary: This is a first patch in a series for the SveEmitter to generate the arm_sve.h header file and builtins. I've tried my best to strip down this patch as best as I could, but there are still a few changes that are not necessarily exercised by the load intrinsics in this patch, mostly around the SVEType class which has some common logic to represent types from a type and prototype string. I thought it didn't make much sense to remove that from this patch and split it up.
-
Simon Tatham authored
Summary: These are complicated integer multiply+add instructions with extra saturation, taking the high half of a double-width product, and optional rounding. There's no sensible way to represent that in standard IR, so I've converted the clang builtins directly to target-specific intrinsics. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76123
-
Simon Tatham authored
Summary: These instructions compute multiply+add in integers, with one of the operands being a splat of a scalar. (VMLA and VMLAS differ in whether the splat operand is a multiplier or the addend.) I've represented these in IR using existing standard IR operations for the unpredicated forms. The predicated forms are done with target- specific intrinsics, as usual. When operating on n-bit vector lanes, only the bottom n bits of the i32 scalar operand are used. So we have to tell that to isel lowering, to allow it to remove a pointless sign- or zero-extension instruction on that input register. That's done in `PerformIntrinsicCombine`, but first I had to enable `PerformIntrinsicCombine` for MVE targets (previously all the intrinsics it handled were for NEON), and make it a method of `ARMTargetLowering` so that it can get at `SimplifyDemandedBits`. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76122
-
serge-sans-paille authored
Correctly set RelocationModel, thanks @modocache for spotting this. Related to differential revision: https://reviews.llvm.org/D75579
-
Simon Pilgrim authored
These shifts are masked to be inrange so we should be able to replace them with generic shifts.
-