- Jun 19, 2020
-
-
Kristof Beyls authored
A "BTI c" instruction only allows jumping/calling to using a BLR* instruction. However, the SLSBLR mitigation changes a BLR to a BR to implement the function call. Therefore, a "BTI c" check that passed before could trigger after the BLR->BL change done by the SLSBLR mitigation. However, if the register used in BR is X16 or X17, this trigger will not fire (see ArmARM for further details). Therefore, this patch simply changes the function stubs for the SLSBLR mitigation from __llvm_slsblr_thunk_x<N>: br x<N> SpeculationBarrier to __llvm_slsblr_thunk_x<N>: mov x16, x<N> br x16 SpeculationBarrier Differential Revision: https://reviews.llvm.org/D81405
-
Francesco Petrogalli authored
Reviewers: efriedma, sdesmalen Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80741
-
- Jun 18, 2020
-
-
David Sherwood authored
There are now quite a few SVE tests in LLVM and Clang that do not emit warnings related to invalid use of EVT::getVectorNumElements() and VectorType::getNumElements(). For these tests I have added additional checks that there are no warnings in order to prevent any future regressions. Differential Revision: https://reviews.llvm.org/D80712
-
David Sherwood authored
This reverts commit fb495c31. Was causing test failures and broke buildbot.
-
David Sherwood authored
There are now quite a few SVE tests in LLVM and Clang that do not emit warnings related to invalid use of EVT::getVectorNumElements() and VectorType::getNumElements(). For these tests I have added additional checks that there are no warnings in order to prevent any future regressions. Differential Revision: https://reviews.llvm.org/D80712
-
Kristof Beyls authored
This also enables running the AArch64 SLSHardening pass with GlobalISel, so add a test for that. Differential Revision: https://reviews.llvm.org/D81403
-
Kristof Beyls authored
The enum values for AArch64 registers are not all consecutive. Therefore, the computation "__llvm_slsblr_thunk_x" + utostr(Reg - AArch64::X0) is not always correct. utostr(Reg - AArch64::X0) will not generate the expected string for the registers that do not have consecutive values in the enum. This happened to work for most registers, but does not for AArch64::FP (i.e. register X29). This can get triggered when the X29 is not used as a frame pointer. Differential Revision: https://reviews.llvm.org/D81997
-
- Jun 17, 2020
-
-
Ian Levesque authored
Summary: Add a flag to omit the xray_fn_idx to cut size overhead and relocations roughly in half at the cost of reduced performance for single function patching. Minor additions to compiler-rt support per-function patching without the index. Reviewers: dberris, MaskRay, johnislarry Subscribers: hiraditya, arphaman, cfe-commits, #sanitizers, llvm-commits Tags: #clang, #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D81995
-
Daniel Sanders authored
Summary: Adds two features to the generated rule disable option: - '*' - Disable all rules - '!<foo>' - Re-enable rule(s) - '!foo' - Enable rule named 'foo' - '!5' - Enable rule five - '!4-9' - Enable rule four to nine - '!foo-bar' - Enable rules from 'foo' to (and including) 'bar' (the '!' is available to the generated disable option but is not part of the underlying and determines whether to call setRuleDisabled() or setRuleEnabled()) This is intended to support unit testing of combine rules so that you can do: GeneratedCfg.setRuleDisabled("*") GeneratedCfg.setRuleEnabled("foo") to ensure only a specific rule is in effect. The rule is still required to be included in a combiner though Also added --...-only-enable-rule=X,Y which is effectively an alias for --...-disable-rule=*,!X,!Y and as such interacts properly with disable-rule. Reviewers: aditya_nandakumar, bogner, volkan, aemerson, paquette, arsenm Subscribers: wdng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81889
-
- Jun 16, 2020
-
-
Jessica Paquette authored
When selecting 32 b -> 64 b G_ZEXTs, we don't have to always emit the extend. If the instruction feeding into the G_ZEXT implicitly zero extends the high half of the register, we can just emit a SUBREG_TO_REG instead. Differential Revision: https://reviews.llvm.org/D81897
-
Luke Geeson authored
This patch upstreams support for BFloat Matrix Multiplication Intrinsics and Code Generation from __bf16 to AArch64. This includes IR intrinsics. Unittests are provided as needed. AArch32 Intrinsics + CodeGen will come after this patch. This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a The bfloat type, and its properties are specified in the Arm Architecture Reference Manual: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile The following people contributed to this patch: Luke Geeson - Momchil Velikov - Mikhail Maltsev - Luke Cheeseman Reviewers: SjoerdMeijer, t.p.northover, sdesmalen, labrinea, miyuki, stuij Reviewed By: miyuki, stuij Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits, miyuki, chill, pbarrio, stuij Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D80752 Change-Id: I174f0fd0f600d04e3799b06a7da88973c6c0703f
-
Luke Geeson authored
This patch upstreams support for ld / st variants of BFloat intrinsics in from __bf16 to AArch64. This includes IR intrinsics. Unittests are provided as needed. This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a The bfloat type, and its properties are specified in the Arm Architecture Reference Manual: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile The following people contributed to this patch: - Luke Geeson - Momchil Velikov - Luke Cheeseman Reviewers: fpetrogalli, SjoerdMeijer, sdesmalen, t.p.northover, stuij Reviewed By: stuij Subscribers: arsenm, pratlucas, simon_tatham, labrinea, kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits, pbarrio, stuij Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D80716 Change-Id: I22e1dca2a8a9ec25d1e4f4b200cb50ea493d2575
-
Fangrui Song authored
Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D81814
-
Amara Emerson authored
Note: don't do this for integer 64 bit materialization to match SDAG. Differential Revision: https://reviews.llvm.org/D81893
-
Jessica Paquette authored
It's possible to end up with a zext or something in the way of a G_CONSTANT, even pre-legalization. This can happen with memsets. e.g. https://godbolt.org/z/Bjc8cw To make sure we can catch these cases, use `getConstantVRegValWithLookThrough` instead of `mi_match`. Differential Revision: https://reviews.llvm.org/D81875
-
- Jun 15, 2020
-
-
Amara Emerson authored
-
Jessica Paquette authored
Apparently an x86 bot doesn't like the disabled rule in this test. http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/6569 Remove disabled rule and update the test to try and pacify the bot.
-
Jessica Paquette authored
Add selection support for ext via a new opcode, G_EXT and a post-legalizer combine which matches it. Add an `applyEXT` function, because the AArch64ext patterns require a register for the immediate. So, we have to create a G_CONSTANT to get these without writing new patterns or modifying the existing ones. Tests are the same as arm64-ext.ll. Also prevent ext from firing on the zip test. It has higher priority, so we don't want it potentially getting in the way of mask tests. Also fix up the shuffle-splat test, because ext is now selected there. The test was incorrectly regbank selected before, which could cause a verifier failure when you emit copies. Differential Revision: https://reviews.llvm.org/D81436
-
Jessica Paquette authored
This implements the following combines: ((0-A) + B) -> B-A (A + (0-B)) -> A-B Porting over the basic algebraic combines from the DAGCombiner. There are several combines which fold adds away into subtracts. This is just the simplest one. I noticed that add combines are some of the most commonly hit across CTMark, (via print statements when they fire), so I'm porting over some of the obvious ones. This gives some minor code size improvements on CTMark at -O3 on AArch64. Differential Revision: https://reviews.llvm.org/D77453
-
Francesco Petrogalli authored
Summary: Adding intrinsics and codegen patterns for: * trn1 <Zd>.q, <Zm>.q, <Zn>.q * trn2 <Zd>.q, <Zm>.q, <Zn>.q * zip1 <Zd>.q, <Zm>.q, <Zn>.q * zip2 <Zd>.q, <Zm>.q, <Zn>.q * uzp1 <Zd>.q, <Zm>.q, <Zn>.q * uzp2 <Zd>.q, <Zm>.q, <Zn>.q These instructions are defined in Armv8.6-A. Reviewers: sdesmalen, efriedma, kmclaughlin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80850
-
Daniel Kiss authored
Summary: SCTLR_EL1.BT[01] controls the PACI[AB]SP compatibility with PBYTE 11 (see [1]) This bit will be set to zero so PACI[AB]SP are equal to BTI C instruction only. [1] https://developer.arm.com/docs/ddi0595/b/aarch64-system-registers/sctlr_el1 Reviewers: chill, tamas.petz, pbarrio, ostannard Reviewed By: tamas.petz, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81746
-
Dominik Montada authored
[MachineVerifier][GlobalISel] Check that branches have a MBB operand or are declared indirect. Add missing properties to G_BRJT, G_BRINDIRECT Summary: Teach MachineVerifier to check branches for MBB operands if they are not declared indirect. Add `isBarrier`, `isIndirectBranch` to `G_BRINDIRECT` and `G_BRJT`. Without these, `MachineInstr.isConditionalBranch()` was giving a false-positive for those instructions. Reviewers: aemerson, qcolombet, dsanders, arsenm Reviewed By: dsanders Subscribers: hiraditya, wdng, simoncook, s.egerton, arsenm, rovka, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81587
-
- Jun 13, 2020
-
-
Amanieu d'Antras authored
Summary: Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=46060 I've also added the Extra_IsConvergent flag which was missing from FastISel. Reviewers: echristo Reviewed By: echristo Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80759
-
- Jun 12, 2020
-
-
Amara Emerson authored
Differential Revision: https://reviews.llvm.org/D81419
-
Jessica Paquette authored
We select all of these via patterns now, so there's no reason to disallow this. Update select-dup.mir to show that we correctly select the smaller types. Differential Revision: https://reviews.llvm.org/D81322
-
Jessica Paquette authored
This was making it so that the instructions weren't eliminated in select-rev.mir and select-trn.mir despite not being used. Update the tests accordingly. Differential Revision: https://reviews.llvm.org/D81492
-
Kristof Beyls authored
To make sure that no barrier gets placed on the architectural execution path, each BLR x<N> instruction gets transformed to a BL __llvm_slsblr_thunk_x<N> instruction, with __llvm_slsblr_thunk_x<N> a thunk that contains __llvm_slsblr_thunk_x<N>: BR x<N> <speculation barrier> Therefore, the BLR instruction gets split into 2; one BL and one BR. This transformation results in not inserting a speculation barrier on the architectural execution path. The mitigation is off by default and can be enabled by the harden-sls-blr subtarget feature. As a linker is allowed to clobber X16 and X17 on function calls, the above code transformation would not be correct in case a linker does so when N=16 or N=17. Therefore, when the mitigation is enabled, generation of BLR x16 or BLR x17 is avoided. As BLRA* indirect calls are not produced by LLVM currently, this does not aim to implement support for those. Differential Revision: https://reviews.llvm.org/D81402
-
- Jun 11, 2020
-
-
Fangrui Song authored
-
Eli Friedman authored
-
Dominik Montada authored
[GlobalISel] fix crash in IRTranslator, MachineIRBuilder when translating @llvm.dbg.value intrinsic and using -debug Summary: Fix crash when using -debug caused by the GlobalISel observer trying to print an incomplete DBG_VALUE instruction. This was caused by the MachineIRBuilder using buildInstr, which immediately inserts the instruction causing print, instead of using BuildMI to first build up the instruction and using insertInstr when finished. Add RUN-line to existing debug-insts.ll test with -debug flag set to make sure no crash is happening. Also fixed a missing %s in the 2nd RUN-line of the same test. Reviewers: t.p.northover, aditya_nandakumar, aemerson, dsanders, arsenm Reviewed By: arsenm Subscribers: wdng, arsenm, rovka, hiraditya, volkan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76934
-
David Sherwood authored
Until we have a real need for computing known bits for scalable vectors I have simply changed the code to bail out for now and pretend we know nothing. I've also fixed up some simple callers of computeKnownBits too. Differential Revision: https://reviews.llvm.org/D80437
-
Kristof Beyls authored
Some processors may speculatively execute the instructions immediately following RET (returns) and BR (indirect jumps), even though control flow should change unconditionally at these instructions. To avoid a potential miss-speculatively executed gadget after these instructions leaking secrets through side channels, this pass places a speculation barrier immediately after every RET and BR instruction. Since these barriers are never on the correct, architectural execution path, performance overhead of this is expected to be low. On targets that implement that Armv8.0-SB Speculation Barrier extension, a single SB instruction is emitted that acts as a speculation barrier. On other targets, a DSB SYS followed by a ISB is emitted to act as a speculation barrier. These speculation barriers are implemented as pseudo instructions to avoid later passes to analyze them and potentially remove them. Even though currently LLVM does not produce BRAA/BRAB/BRAAZ/BRABZ instructions, these are also mitigated by the pass and tested through a MIR test. The mitigation is off by default and can be enabled by the harden-sls-retbr subtarget feature. Differential Revision: https://reviews.llvm.org/D81400
-
- Jun 10, 2020
-
-
Sander de Smalen authored
Instead of loading from e.g. `<vscale x 16 x i8>*`, load from element pointer `i8*`. This is more in line with the other load/store intrinsics for SVE. Reviewers: fpetrogalli, c-rhodes, rengolin, efriedma Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D81458
-
Shawn Landden authored
halves the number of CNT instructions generated
-
Amara Emerson authored
This ensures that we match SelectionDAG behaviour by waiting until the expand pseudos pass to generate ADRP + ADD pairs. Doing this at selection time for the G_ADD_LOW is fine because by the time we get to selecting the G_ADD_LOW, previous attempts to fold it into loads/stores must have failed. Differential Revision: https://reviews.llvm.org/D81512
-
- Jun 09, 2020
-
-
Mehdi Amini authored
Having the input dumped on failure seems like a better default: I debugged FileCheck tests for a while without knowing about this option, which really helps to understand failures. Remove `-dump-input-on-failure` and the environment variable FILECHECK_DUMP_INPUT_ON_FAILURE which are now obsolete. Differential Revision: https://reviews.llvm.org/D81422
-
David Green authored
If a resource can be held for multiple cycles in the schedule model then an instruction can be placed into the available queue, another instruction can be scheduled, but the first will not be taken back out if the two instructions hazard. To fix this make sure that we update the available queue even on the first MOp of a cycle, pushing available instructions back into the pending queue if they now conflict. This happens with some downstream schedules we have around MVE instruction scheduling where we use ResourceCycles=[2] to show the instruction executing over two beats. Apparently the test changes here are OK too. Differential Revision: https://reviews.llvm.org/D76909
-
Jessica Paquette authored
Same idea as for zip, uzp, etc. Teach the post-legalizer combiner to recognize G_SHUFFLE_VECTORs that are trn1/trn2 instructions. - Add G_TRN1 and G_TRN2 - Port mask matching code from AArch64ISelLowering - Produce G_TRN1 and G_TRN2 in the post-legalizer combiner - Select via importer Add select-trn.mir to test selection. Add postlegalizer-combiner-trn.mir to test the combine. This is similar to the existing arm64-trn test. Note that both of these tests contain things we currently don't legalize. I figured it would be easier to test these now rather than later, since once we legalize the G_SHUFFLE_VECTORs, it's not guaranteed that someone will update the tests. Differential Revision: https://reviews.llvm.org/D81182
-
Sanjay Patel authored
If fmul and fadd are separated by an fma, we can fold them together to save an instruction: fadd (fma A, B, (fmul C, D)), N1 --> fma(A, B, fma(C, D, N1)) The fold implemented here is actually a specialization - we should be able to peek through >1 fma to find this pattern. That's another patch if we want to try that enhancement though. This transform was guarded by the TLI hook enableAggressiveFMAFusion(), so it was done for some in-tree targets like PowerPC, but not AArch64 or x86. The hook is protecting against forming a potentially more expensive computation when fma takes longer to execute than a single fadd. That hook may be needed for other transforms, but in this case, we are replacing fmul+fadd with fma, and the fma should never take longer than the 2 individual instructions. 'contract' FMF is all we need to allow this transform. That flag corresponds to -ffp-contract=fast in Clang, so we are allowed to form fma ops freely across expressions. Differential Revision: https://reviews.llvm.org/D80801
-
Cullen Rhodes authored
Summary: This patch adds initial support for the following instrinsics: * llvm.aarch64.sve.ld2 * llvm.aarch64.sve.ld3 * llvm.aarch64.sve.ld4 For loading two, three and four vectors worth of data. Basic codegen is implemented with reg+reg and reg+imm addressing modes being addressed in a later patch. The types returned by these intrinsics have a number of elements that is a multiple of the elements in a 128-bit vector for a given type and N, where N is the number of vectors being loaded, i.e. 2, 3 or 4. Thus, for 32-bit elements the types are: LD2 : <vscale x 8 x i32> LD3 : <vscale x 12 x i32> LD4 : <vscale x 16 x i32> This is implemented with target-specific intrinsics for each variant that take the same operands as the IR intrinsic but return N values, where the type of each value is a full vector, i.e. <vscale x 4 x i32> in the above example. These values are then concatenated using the standard concat_vector intrinsic to maintain type legality with the IR. These intrinsics are intended for use in the Arm C Language Extension (ACLE). Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D75751
-