- Feb 01, 2018
-
-
Yvan Roux authored
llvm-svn: 323947
-
- Jan 31, 2018
-
-
Evgeniy Stepanov authored
Miscompiles code. Testcase pending. This reverts commit r323869. llvm-svn: 323929
-
Diana Picus authored
llvm-svn: 323878
-
Diana Picus authored
Start using the new LegalizerInfo API introduced in r323681. Keep the old API for opcodes that need Lowering in some circumstances (G_FNEG and G_UREM/G_SREM). llvm-svn: 323876
-
Pablo Barrio authored
Summary: Expressions of the form x < 0 ? 0 : x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before. Patch by Marten Svanfeldt. Reviewers: fhahn, pbarrio Reviewed By: pbarrio Subscribers: efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42574 llvm-svn: 323869
-
Sjoerd Meijer authored
Half-precision arguments and return values are passed as if it were an int or float for ARM. This results in truncates and bitcasts to/from i16 and f16 values, which are legalized very early to stack stores/loads. When FullFP16 is enabled, we want to avoid codegen for these bitcasts as it is unnecessary and inefficient. Differential Revision: https://reviews.llvm.org/D42580 llvm-svn: 323861
-
Roger Ferrer Ibanez authored
In Thumb 1, with the new ADDCARRY / SUBCARRY the scheduler may need to do copies CPSR
↔ GPR but not all Thumb1 targets implement them. The schedule can attempt, before attempting a copy, to clone the instructions but it does not currently do that for nodes with input glue. In this patch we introduce a target-hook to let the hook decide if a glued machinenode is still eligible for copying. In this case these are ARM::tADCS and ARM::tSBCS . As a follow-up of this change we should actually implement the copies for the Thumb1 targets that do implement them and restrict the hook to the targets that can't really do such copy as these clones are not ideal. This change fixes PR35836. Differential Revision: https://reviews.llvm.org/D42051 llvm-svn: 323857
-
- Jan 30, 2018
-
-
Diana Picus authored
Straightforward mapping (integer operand to GPR, floating point operand to FPR). llvm-svn: 323731
-
Diana Picus authored
Legal if we have hardware support, libcall otherwise. Also add supporting code to the legalizer helper for libcalls. llvm-svn: 323730
-
Diana Picus authored
Straightforward mapping (integer operand goes to GPR, floating point operand goes to FPR). llvm-svn: 323727
-
Diana Picus authored
Legal if we have hardware support for floating point, libcalls otherwise. Also add the necessary support for libcalls in the legalizer helper. llvm-svn: 323726
-
- Jan 29, 2018
-
-
Daniel Sanders authored
Summary: Apparently, we missed on constraining register classes of VReg-operands of all the instructions built from a destination pattern but the root (top-level) one. The issue exposed itself while selecting G_FPTOSI for armv7: the corresponding pattern generates VTOSIZS wrapped into COPY_TO_REGCLASS, so top-level COPY_TO_REGCLASS gets properly constrained, while nested VTOSIZS (or rather its destination virtual register to be exact) does not. Fixing this by issuing GIR_ConstrainSelectedInstOperands for every nested GIR_BuildMI. https://bugs.llvm.org/show_bug.cgi?id=35965 rdar://problem/36886530 Patch by Roman Tereshin Reviewers: dsanders, qcolombet, rovka, bogner, aditya_nandakumar, volkan Reviewed By: dsanders, qcolombet, rovka Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42565 llvm-svn: 323692
-
Daniel Sanders authored
Summary: The improvements to the LegalizerInfo discussed in D42244 require that LegalizerInfo::LegalizeAction be available for use in other classes. As such, it needs to be moved out of LegalizerInfo. This has been done separately to the next patch to minimize the noise in that patch. llvm-svn: 323669
-
Sjoerd Meijer authored
Create and use FP16Pat FullFP16Pat helper patterns to make the difference explicit. Differential Revision: https://reviews.llvm.org/D42634 llvm-svn: 323640
-
- Jan 26, 2018
-
-
Momchil Velikov authored
load instruction The function `Thumb1InstrInfo::loadRegFromStackSlot` accepts only the `tGPR` register class. The function serves to emit a `tLDRspi` instruction and certainly any subset of the `tGPR` register class is a valid destination of the load. Differential revision: https://reviews.llvm.org/D42535 llvm-svn: 323514
-
Sjoerd Meijer authored
This is the groundwork for Armv8.2-A FP16 code generation . Clang passes and returns _Float16 values as floats, together with the required bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318. We will implement half-precision argument passing/returning lowering in the ARM backend soon, but for now this means that this: _Float16 sub(_Float16 a, _Float16 b) { return a + b; } gets lowered to this: define float @sub(float %a.coerce, float %b.coerce) { entry: %0 = bitcast float %a.coerce to i32 %tmp.0.extract.trunc = trunc i32 %0 to i16 %1 = bitcast i16 %tmp.0.extract.trunc to half <SNIP> %add = fadd half %1, %3 <SNIP> } When FullFP16 is *not* supported, we don't make f16 a legal type, and we get legalization for "free", i.e. nothing changes and everything works as before. And also f16 argument passing/returning is handled. When FullFP16 is supported, we do make f16 a legal type, and have 2 places that we need to patch up: f16 argument passing and returning, which involves minor tweaks to avoid unnecessary code generation for some bitcasts. As a "demonstrator" that this works for the different FP16, FullFP16, softfp modes, etc., I've added match rules to the VSUB instruction description showing that we can codegen this instruction from IR, but more importantly, also to some conversion instructions. These conversions were causing issue before in the FP16 and FullFP16 cases. I've also added match rules to the VLDRH and VSTRH desriptions, so that we can actually compile the entire half-precision sub code example above. This showed that these loads and stores had the wrong addressing mode specified: AddrMode5 instead of AddrMode5FP16, which turned out not be implemented at all, so that has also been added. This is the minimal patch that shows all the different moving parts. In patch 2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the remaining Armv8.2-A FP16 instruction descriptions. Thanks to Sam Parker and Oliver Stannard for their help and reviews! Differential Revision: https://reviews.llvm.org/D38315 llvm-svn: 323512
-
- Jan 24, 2018
-
-
Weiming Zhao authored
Summary: For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1. Reviewers: samparker Reviewed By: samparker Subscribers: samparker, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42401 llvm-svn: 323354
-
Martin Storsjö authored
This matches what MSVC does for alloca() function calls on ARM. Even if MSVC doesn't support VLAs at the language level, it does support the alloca function. On the clang level, both the _alloca() (when emulating MSVC, which is what the alloca() function expands to) and __builtin_alloca() builtin functions, and VLAs, map to the same LLVM IR "alloca" function - so within LLVM they're not distinguishable from each other. Differential Revision: https://reviews.llvm.org/D42292 llvm-svn: 323308
-
- Jan 22, 2018
-
-
Joel Galenson authored
As noted in another review, this loop is confusing. This commit cleans it up somewhat. Differential Revision: https://reviews.llvm.org/D42312 llvm-svn: 323136
-
Marina Yatsina authored
This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869 Most of the patches are intended at refactoring the existent code. Additional relevant reviews: https://reviews.llvm.org/D40330 https://reviews.llvm.org/D40331 https://reviews.llvm.org/D40332 https://reviews.llvm.org/D40334 Differential Revision: https://reviews.llvm.org/D40333 Change-Id: Ie5f8eb34d98cfdfae23a3072eb69b5794f0e2d56 llvm-svn: 323095
-
Marina Yatsina authored
This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869 Most of the patches are intended at refactoring the existent code. Additional relevant reviews: https://reviews.llvm.org/D40330 https://reviews.llvm.org/D40331 https://reviews.llvm.org/D40333 https://reviews.llvm.org/D40334 Differential Revision: https://reviews.llvm.org/D40332 Change-Id: I6a048cca7fdafbfc42fb1bac94343e483befded8 llvm-svn: 323094
-
Marina Yatsina authored
1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains). 2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings. 3. BreakFalseDeps - Breaks false dependencies. 4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks. This also included the following changes to ExcecutionDepsFix original logic: 1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class. 2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class. Additional changes in affected files: 1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate. 2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate. Additional refactoring changes will follow. This commit is (almost) NFC. The only functional change is that now BreakFalseDeps will break dependency for all register classes. Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet. In a future commit several instructions (and tests) will be added. This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869 Most of the patches are intended at refactoring the existent code. Additional relevant reviews: https://reviews.llvm.org/D40331 https://reviews.llvm.org/D40332 https://reviews.llvm.org/D40333 https://reviews.llvm.org/D40334 Differential Revision: https://reviews.llvm.org/D40330 Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275 llvm-svn: 323087
-
- Jan 19, 2018
-
-
Joel Galenson authored
Fix a performance regression caused by r322737. While trying to make it easier to replace compares with existing adds and subtracts, I accidentally stopped it from doing so in some cases. This should fix that. I'm also fixing another potential bug in that commit. Differential Revision: https://reviews.llvm.org/D42263 llvm-svn: 322972
-
Daniel Neilson authored
Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 llvm-svn: 322965
-
- Jan 18, 2018
-
-
Reid Kleckner authored
Every known PE COFF target emits /EXPORT: linker flags into a .drective section. The AsmPrinter should handle this. While we're at it, use global_values() and emit each export flag with its own .ascii directive. This should make the .s file output more readable. llvm-svn: 322788
-
- Jan 17, 2018
-
-
Joel Galenson authored
This extends my previous patches to also optimize overflow-checked multiplies during SelectionDAG. Differential revision: https://reviews.llvm.org/D40922 llvm-svn: 322738
-
Joel Galenson authored
The ARM backend contains code that tries to optimize compares by replacing them with an existing instruction that sets the flags the same way. This allows it to replace a "cmp" with a "adds", generalizing the code that replaces "cmp" with "sub". It also heuristically disables sinking of instructions that could potentially be used to replace compares (currently only if they're next to each other). Differential revision: https://reviews.llvm.org/D38378 llvm-svn: 322737
-
Diana Picus authored
llvm-svn: 322667
-
Diana Picus authored
llvm-svn: 322657
-
Diana Picus authored
Mark G_FPEXT and G_FPTRUNC as legal or libcall, depending on hardware support, but only for conversions between float and double. Also add the necessary boilerplate so that the LegalizerHelper can introduce the required libcalls. This also works only for float and double, but isn't too difficult to extend when the need arises. llvm-svn: 322651
-
- Jan 12, 2018
-
-
Diana Picus authored
llvm-svn: 322367
-
Diana Picus authored
For hard float with VFP4, it is legal. Otherwise, we use libcalls. This needs a bit of support in the LegalizerHelper for soft float because we didn't handle G_FMA libcalls yet. The support is trivial, as the only difference between G_FMA and other libcalls that we already handle is that it has 3 input operands rather than just 2. llvm-svn: 322366
-
Andre Vieira authored
This patch teaches the Arm back-end to generate the SMMULR, SMMLAR and SMMLSR instructions from equivalent IR patterns. Differential Revision: https://reviews.llvm.org/D41775 llvm-svn: 322361
-
Andre Vieira authored
Differential Revision: https://reviews.llvm.org/D41855 llvm-svn: 322360
-
- Jan 11, 2018
-
-
Matthias Braun authored
The PeepholeOptimizer would fail for vregs without a definition. If this was caused by an undef operand abort to keep the code simple (so we don't need to add logic everywhere to replicate the undef flag). Differential Revision: https://reviews.llvm.org/D40763 llvm-svn: 322319
-
Evgeniy Stepanov authored
Reviewers: efriedma, pcc Subscribers: aemerson, javed.absar, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D39975 llvm-svn: 322312
-
- Jan 10, 2018
-
-
Diana Picus authored
llvm-svn: 322169
-
Diana Picus authored
For hard float, it is legal. For soft float, we need to lower to 0 - x first, and then we can use the libcall for G_FSUB. This is undoing some of the canonicalization performed by the IRTranslator (which introduces G_FNEG when it sees a 0 - x). Ideally, that canonicalization would be performed by a pre-legalizer pass that would allow targets to opt out of this behaviour rather than dance around it in the legalizer. llvm-svn: 322168
-
Diana Picus authored
Legal for hard float. Change to G_CONSTANT for soft float (but preserve the binary representation). llvm-svn: 322164
-
Diana Picus authored
Make G_CONSTANT narrow for any scalars larger than 32 bits. llvm-svn: 322162
-