Commits · d938ec4509c47d461377527fc2877ae14b91275c · Lorenzo Albano / LLVM bpEVL

Jun 19, 2020

[AArch64] Avoid incompatibility between SLSBLR mitigation and BTI codegen. · d938ec45

Kristof Beyls authored May 22, 2020

A "BTI c" instruction only allows jumping/calling to using a BLR* instruction.
However, the SLSBLR mitigation changes a BLR to a BR to implement the
function call. Therefore, a "BTI c" check that passed before could
trigger after the BLR->BL change done by the SLSBLR mitigation.
However, if the register used in BR is X16 or X17, this trigger will not
fire (see ArmARM for further details).

Therefore, this patch simply changes the function stubs for the SLSBLR
mitigation from
__llvm_slsblr_thunk_x<N>:
    br x<N>
    SpeculationBarrier
to
__llvm_slsblr_thunk_x<N>:
    mov x16, x<N>
    br  x16
    SpeculationBarrier

Differential Revision: https://reviews.llvm.org/D81405

d938ec45

[llvm][SVE] Reg + reg addressing mode for LD1RO. · d32c1346

Francesco Petrogalli authored Jun 19, 2020

Reviewers: efriedma, sdesmalen

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80741

d32c1346

Jun 18, 2020

[SVE] Add checks for no warnings in SVE tests · 3cfd74e6

David Sherwood authored May 28, 2020

There are now quite a few SVE tests in LLVM and Clang that do not
emit warnings related to invalid use of EVT::getVectorNumElements()
and VectorType::getNumElements(). For these tests I have added
additional checks that there are no warnings in order to prevent
any future regressions.

Differential Revision: https://reviews.llvm.org/D80712

3cfd74e6

Revert "[SVE] Add checks for no warnings in SVE tests" · e3836fe1
David Sherwood authored Jun 18, 2020
```
This reverts commit fb495c31.

Was causing test failures and broke buildbot.
```
e3836fe1

[SVE] Add checks for no warnings in SVE tests · fb495c31

David Sherwood authored May 28, 2020

There are now quite a few SVE tests in LLVM and Clang that do not
emit warnings related to invalid use of EVT::getVectorNumElements()
and VectorType::getNumElements(). For these tests I have added
additional checks that there are no warnings in order to prevent
any future regressions.

Differential Revision: https://reviews.llvm.org/D80712

fb495c31

[IndirectThunks] Make generated MF structure as expected by all instruction selectors. · 832cfc76

Kristof Beyls authored Jun 11, 2020

This also enables running the AArch64 SLSHardening pass with GlobalISel,
so add a test for that.

Differential Revision: https://reviews.llvm.org/D81403

832cfc76

[AArch64] SLSHardening: compute correct thunk name for X29. · 3f0cc96a

Kristof Beyls authored Jun 17, 2020

The enum values for AArch64 registers are not all consecutive.
Therefore, the computation
  "__llvm_slsblr_thunk_x" + utostr(Reg - AArch64::X0)
is not always correct. utostr(Reg - AArch64::X0) will not generate the
expected string for the registers that do not have consecutive values in
the enum.
This happened to work for most registers, but does not for AArch64::FP
(i.e. register X29).
This can get triggered when the X29 is not used as a frame pointer.

Differential Revision: https://reviews.llvm.org/D81997

3f0cc96a

Jun 17, 2020

[xray] Option to omit the function index · 7c7c8e0d

Ian Levesque authored Jun 16, 2020

Summary:
Add a flag to omit the xray_fn_idx to cut size overhead and relocations
roughly in half at the cost of reduced performance for single function
patching.  Minor additions to compiler-rt support per-function patching
without the index.

Reviewers: dberris, MaskRay, johnislarry

Subscribers: hiraditya, arphaman, cfe-commits, #sanitizers, llvm-commits

Tags: #clang, #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D81995

7c7c8e0d

[gicombiner] Allow disable-rule option to disable all-except-... · 778db887

Daniel Sanders authored Jun 16, 2020

Summary:
Adds two features to the generated rule disable option:
- '*' - Disable all rules
- '!<foo>' - Re-enable rule(s)
  - '!foo' - Enable rule named 'foo'
  - '!5' - Enable rule five
  - '!4-9' - Enable rule four to nine
  - '!foo-bar' - Enable rules from 'foo' to (and including) 'bar'
(the '!' is available to the generated disable option but is not part of the underlying and determines whether to call setRuleDisabled() or setRuleEnabled())

This is intended to support unit testing of combine rules so
that you can do:
  GeneratedCfg.setRuleDisabled("*")
  GeneratedCfg.setRuleEnabled("foo")
to ensure only a specific rule is in effect. The rule is still
required to be included in a combiner though

Also added --...-only-enable-rule=X,Y which is effectively an
alias for --...-disable-rule=*,!X,!Y and as such interacts
properly with disable-rule.

Reviewers: aditya_nandakumar, bogner, volkan, aemerson, paquette, arsenm

Subscribers: wdng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81889

778db887

Jun 16, 2020

[AArch64][GlobalISel] Avoid creating redundant ubfx when selecting G_ZEXT · 7caa9caa

Jessica Paquette authored Jun 15, 2020

When selecting 32 b -> 64 b G_ZEXTs, we don't have to always emit the extend.

If the instruction feeding into the G_ZEXT implicitly zero extends the high
half of the register, we can just emit a SUBREG_TO_REG instead.

Differential Revision: https://reviews.llvm.org/D81897

7caa9caa

[AArch64]: BFloat MatMul Intrinsics&CodeGen · 10b6567f

Luke Geeson authored Jun 09, 2020

This patch upstreams support for BFloat Matrix Multiplication Intrinsics
and Code Generation from __bf16 to AArch64. This includes IR intrinsics. Unittests are
provided as needed. AArch32 Intrinsics + CodeGen will come after this
patch.

This patch is part of a series implementing the Bfloat16 extension of
the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm
Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:

Luke Geeson
 - Momchil Velikov
 - Mikhail Maltsev
 - Luke Cheeseman

Reviewers: SjoerdMeijer, t.p.northover, sdesmalen, labrinea, miyuki,
stuij

Reviewed By: miyuki, stuij

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki, chill, pbarrio, stuij

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D80752

Change-Id: I174f0fd0f600d04e3799b06a7da88973c6c0703f

10b6567f

[AArch64]: BFloat Load/Store Intrinsics&CodeGen · 508a4764

Luke Geeson authored Jun 09, 2020

This patch upstreams support for ld / st variants of BFloat intrinsics
in from __bf16 to AArch64. This includes IR intrinsics. Unittests are
provided as needed.

This patch is part of a series implementing the Bfloat16 extension of
the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm
Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:

 - Luke Geeson
 - Momchil Velikov
 - Luke Cheeseman

Reviewers: fpetrogalli, SjoerdMeijer, sdesmalen, t.p.northover, stuij

Reviewed By: stuij

Subscribers: arsenm, pratlucas, simon_tatham, labrinea, kristof.beyls,
hiraditya, danielkiss, cfe-commits, llvm-commits, pbarrio, stuij

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D80716

Change-Id: I22e1dca2a8a9ec25d1e4f4b200cb50ea493d2575

508a4764

[AArch64] Print the immediate operand for SPACE pseudo instruction · a3b5f428
Fangrui Song authored Jun 15, 2020
```
Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D81814
```
a3b5f428

[AArch64][GlobalISel] Emit constant pool loads for 64 bit fp immediates. · 1035a416

Amara Emerson authored Jun 15, 2020

Note: don't do this for integer 64 bit materialization to match SDAG.

Differential Revision: https://reviews.llvm.org/D81893

1035a416

[GlobalISel] Look through extends etc in CombinerHelper::matchConstantOp · 5a4c3f6b

Jessica Paquette authored Jun 15, 2020

It's possible to end up with a zext or something in the way of a G_CONSTANT,
even pre-legalization. This can happen with memsets.

e.g.

https://godbolt.org/z/Bjc8cw

To make sure we can catch these cases, use `getConstantVRegValWithLookThrough`
instead of `mi_match`.

Differential Revision: https://reviews.llvm.org/D81875

5a4c3f6b

Jun 15, 2020

[GlobalISel] Don't emit multiply by magic constant for zero memset values. · fc905ae0
Amara Emerson authored Jun 15, 2020

fc905ae0

NFC: Remove disabled rule from postlegalizer-combiner-zip.mir test · 7c93a197

Jessica Paquette authored Jun 15, 2020

Apparently an x86 bot doesn't like the disabled rule in this test.

http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/6569

Remove disabled rule and update the test to try and pacify the bot.

7c93a197

[AArch64][GlobalISel] Add G_EXT and select ext using it · 3495b884

Jessica Paquette authored Jun 08, 2020

Add selection support for ext via a new opcode, G_EXT and a post-legalizer
combine which matches it.

Add an `applyEXT` function, because the AArch64ext patterns require a register
for the immediate. So, we have to create a G_CONSTANT to get these without
writing new patterns or modifying the existing ones.

Tests are the same as arm64-ext.ll.

Also prevent ext from firing on the zip test. It has higher priority, so we
don't want it potentially getting in the way of mask tests.

Also fix up the shuffle-splat test, because ext is now selected there. The
test was incorrectly regbank selected before, which could cause a verifier
failure when you emit copies.

Differential Revision: https://reviews.llvm.org/D81436

3495b884

[GlobalISel] Simplify G_ADD when it has (0-X) on the LHS or RHS · 1ac8451a

Jessica Paquette authored Apr 03, 2020

This implements the following combines:

((0-A) + B) -> B-A
(A + (0-B)) -> A-B

Porting over the basic algebraic combines from the DAGCombiner. There are
several combines which fold adds away into subtracts. This is just the simplest
one.

I noticed that add combines are some of the most commonly hit across CTMark,
(via print statements when they fire), so I'm porting over some of the obvious
ones.

This gives some minor code size improvements on CTMark at -O3 on AArch64.

Differential Revision: https://reviews.llvm.org/D77453

1ac8451a

[llvm][SVE] IR intrinsics for quadword permutation instructions. · 28a00ac9

Francesco Petrogalli authored Jun 15, 2020

Summary:
Adding intrinsics and codegen patterns for:

* trn1 <Zd>.q, <Zm>.q, <Zn>.q
* trn2 <Zd>.q, <Zm>.q, <Zn>.q
* zip1 <Zd>.q, <Zm>.q, <Zn>.q
* zip2 <Zd>.q, <Zm>.q, <Zn>.q
* uzp1 <Zd>.q, <Zm>.q, <Zn>.q
* uzp2 <Zd>.q, <Zm>.q, <Zn>.q

These instructions are defined in Armv8.6-A.

Reviewers: sdesmalen, efriedma, kmclaughlin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80850

28a00ac9

[AArch64] Fix BTI instruction emission. · b8ae3fdf

Daniel Kiss authored Jun 15, 2020

Summary:
SCTLR_EL1.BT[01] controls the PACI[AB]SP compatibility with PBYTE 11
(see [1])
This bit will be set to zero so PACI[AB]SP are equal to BTI C
instruction only.

[1] https://developer.arm.com/docs/ddi0595/b/aarch64-system-registers/sctlr_el1

Reviewers: chill, tamas.petz, pbarrio, ostannard

Reviewed By: tamas.petz, ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81746

b8ae3fdf

[MachineVerifier][GlobalISel] Check that branches have a MBB operand or are... · c87bf291

Dominik Montada authored Jun 10, 2020

[MachineVerifier][GlobalISel] Check that branches have a MBB operand or are declared indirect. Add missing properties to G_BRJT, G_BRINDIRECT

Summary:
Teach MachineVerifier to check branches for MBB operands if they are not declared indirect.

Add `isBarrier`, `isIndirectBranch` to `G_BRINDIRECT` and `G_BRJT`.
Without these, `MachineInstr.isConditionalBranch()` was giving a
false-positive for those instructions.

Reviewers: aemerson, qcolombet, dsanders, arsenm

Reviewed By: dsanders

Subscribers: hiraditya, wdng, simoncook, s.egerton, arsenm, rovka, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81587

c87bf291

Jun 13, 2020

Fix FastISel dropping srcloc metadata from InlineAsm · 6973125c

Amanieu d'Antras authored Jun 13, 2020

Summary:
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=46060

I've also added the Extra_IsConvergent flag which was missing from FastISel.

Reviewers: echristo

Reviewed By: echristo

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80759

6973125c

Jun 12, 2020

[AArch64][GlobalISel] Legalize vector G_PTR_ADD and enable selection. · 1cbebd95
Amara Emerson authored Jun 08, 2020
```
Differential Revision: https://reviews.llvm.org/D81419
```
1cbebd95

[AArch64][GlobalISel] Allow G_DUP for elements smaller than 32 B. · d3a56f06

Jessica Paquette authored Jun 05, 2020

We select all of these via patterns now, so there's no reason to disallow this.

Update select-dup.mir to show that we correctly select the smaller types.

Differential Revision: https://reviews.llvm.org/D81322

d3a56f06

[AArch64][GlobalISel] Set hasSideEffects = 0 on custom shuffle opcodes · 305862a5

Jessica Paquette authored Jun 09, 2020

This was making it so that the instructions weren't eliminated in
select-rev.mir and select-trn.mir despite not being used.

Update the tests accordingly.

Differential Revision: https://reviews.llvm.org/D81492

305862a5

[AArch64] Extend AArch64SLSHardeningPass to harden BLR instructions. · c35ed40f

Kristof Beyls authored Jun 11, 2020

To make sure that no barrier gets placed on the architectural execution
path, each
  BLR x<N>
instruction gets transformed to a
  BL __llvm_slsblr_thunk_x<N>
instruction, with __llvm_slsblr_thunk_x<N> a thunk that contains
__llvm_slsblr_thunk_x<N>:
  BR x<N>
  <speculation barrier>

Therefore, the BLR instruction gets split into 2; one BL and one BR.
This transformation results in not inserting a speculation barrier on
the architectural execution path.

The mitigation is off by default and can be enabled by the
harden-sls-blr subtarget feature.

As a linker is allowed to clobber X16 and X17 on function calls, the
above code transformation would not be correct in case a linker does so
when N=16 or N=17. Therefore, when the mitigation is enabled, generation
of BLR x16 or BLR x17 is avoided.

As BLRA* indirect calls are not produced by LLVM currently, this does
not aim to implement support for those.

Differential Revision:  https://reviews.llvm.org/D81402

c35ed40f

Jun 11, 2020

[GlobalISel][test] Add REQUIRES: asserts after D76934 · 432f20bc
Fangrui Song authored Jun 11, 2020

432f20bc
[AArch64] Regenerate SVE test llvm-ir-to-intrinsic.ll. · 12459ec9
Eli Friedman authored Jun 11, 2020

12459ec9

[GlobalISel] fix crash in IRTranslator, MachineIRBuilder when translating... · f24e2e9e

Dominik Montada authored Mar 27, 2020

[GlobalISel] fix crash in IRTranslator, MachineIRBuilder when translating @llvm.dbg.value intrinsic and using -debug

Summary:
Fix crash when using -debug caused by the GlobalISel observer trying to print
an incomplete DBG_VALUE instruction. This was caused by the MachineIRBuilder
using buildInstr, which immediately inserts the instruction causing print,
instead of using BuildMI to first build up the instruction and using
insertInstr when finished.

Add RUN-line to existing debug-insts.ll test with -debug flag set to make sure
no crash is happening.

Also fixed a missing %s in the 2nd RUN-line of the same test.

Reviewers: t.p.northover, aditya_nandakumar, aemerson, dsanders, arsenm

Reviewed By: arsenm

Subscribers: wdng, arsenm, rovka, hiraditya, volkan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76934

f24e2e9e

[CodeGen] Let computeKnownBits do something sensible for scalable vectors · bd97342a

David Sherwood authored May 22, 2020

Until we have a real need for computing known bits for scalable
vectors I have simply changed the code to bail out for now and
pretend we know nothing. I've also fixed up some simple callers of
computeKnownBits too.

Differential Revision: https://reviews.llvm.org/D80437

bd97342a

[AArch64] Introduce AArch64SLSHardeningPass, implementing hardening of RET and BR instructions. · 0ee176ed

Kristof Beyls authored Jun 11, 2020

Some processors may speculatively execute the instructions immediately
following RET (returns) and BR (indirect jumps), even though
control flow should change unconditionally at these instructions.
To avoid a potential miss-speculatively executed gadget after these
instructions leaking secrets through side channels, this pass places a
speculation barrier immediately after every RET and BR instruction.

Since these barriers are never on the correct, architectural execution
path, performance overhead of this is expected to be low.

On targets that implement that Armv8.0-SB Speculation Barrier extension,
a single SB instruction is emitted that acts as a speculation barrier.
On other targets, a DSB SYS followed by a ISB is emitted to act as a
speculation barrier.

These speculation barriers are implemented as pseudo instructions to
avoid later passes to analyze them and potentially remove them.

Even though currently LLVM does not produce BRAA/BRAB/BRAAZ/BRABZ
instructions, these are also mitigated by the pass and tested through a
MIR test.

The mitigation is off by default and can be enabled by the
harden-sls-retbr subtarget feature.

Differential Revision:  https://reviews.llvm.org/D81400

0ee176ed

Jun 10, 2020

[AArch64][SVE] Change pointer type of struct load/store intrinsics. · a0e3ceea

Sander de Smalen authored Jun 10, 2020

Instead of loading from e.g. `<vscale x 16 x i8>*`, load from element
pointer `i8*`. This is more in line with the other load/store
intrinsics for SVE.

Reviewers: fpetrogalli, c-rhodes, rengolin, efriedma

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81458

a0e3ceea

[AArch64] custom lowering for i128 popcount · 9ec57cce
Shawn Landden authored Jun 07, 2020
```
halves the number of CNT instructions generated
```
9ec57cce

[AArch64][GlobalISel] Select G_ADD_LOW into a MOVaddr pseudo. · 938cc573

Amara Emerson authored Jun 09, 2020

This ensures that we match SelectionDAG behaviour by waiting until the expand
pseudos pass to generate ADRP + ADD pairs. Doing this at selection time for the
G_ADD_LOW is fine because by the time we get to selecting the G_ADD_LOW,
previous attempts to fold it into loads/stores must have failed.

Differential Revision: https://reviews.llvm.org/D81512

938cc573

Jun 09, 2020

Change filecheck default to dump input on failure · d31c9e5a

Mehdi Amini authored Mar 27, 2020

Having the input dumped on failure seems like a better
default: I debugged FileCheck tests for a while without knowing
about this option, which really helps to understand failures.

Remove `-dump-input-on-failure` and the environment variable
FILECHECK_DUMP_INPUT_ON_FAILURE which are now obsolete.

Differential Revision: https://reviews.llvm.org/D81422

d31c9e5a

[MachineScheduler] Update available queue on the first mop of a new cycle · 2fea3fe4

David Green authored Jun 09, 2020

If a resource can be held for multiple cycles in the schedule model
then an instruction can be placed into the available queue, another
instruction can be scheduled, but the first will not be taken back out if
the two instructions hazard. To fix this make sure that we update the
available queue even on the first MOp of a cycle, pushing available
instructions back into the pending queue if they now conflict.

This happens with some downstream schedules we have around MVE
instruction scheduling where we use ResourceCycles=[2] to show the
instruction executing over two beats. Apparently the test changes here
are OK too.

Differential Revision: https://reviews.llvm.org/D76909

2fea3fe4

[AArch64][GlobalISel] Select trn1 and trn2 · cb2d8b30

Jessica Paquette authored Jun 04, 2020

Same idea as for zip, uzp, etc. Teach the post-legalizer combiner to recognize
G_SHUFFLE_VECTORs that are trn1/trn2 instructions.

- Add G_TRN1 and G_TRN2
- Port mask matching code from AArch64ISelLowering
- Produce G_TRN1 and G_TRN2 in the post-legalizer combiner
- Select via importer

Add select-trn.mir to test selection.

Add postlegalizer-combiner-trn.mir to test the combine. This is similar to the
existing arm64-trn test.

Note that both of these tests contain things we currently don't legalize.

I figured it would be easier to test these now rather than later, since once
we legalize the G_SHUFFLE_VECTORs, it's not guaranteed that someone will update
the tests.

Differential Revision: https://reviews.llvm.org/D81182

cb2d8b30

[DAGCombiner] allow more folding of fadd + fmul into fma · 702cf933

Sanjay Patel authored Jun 09, 2020

If fmul and fadd are separated by an fma, we can fold them together
to save an instruction:
fadd (fma A, B, (fmul C, D)), N1 --> fma(A, B, fma(C, D, N1))

The fold implemented here is actually a specialization - we should
be able to peek through >1 fma to find this pattern. That's another
patch if we want to try that enhancement though.

This transform was guarded by the TLI hook enableAggressiveFMAFusion(),
so it was done for some in-tree targets like PowerPC, but not AArch64
or x86. The hook is protecting against forming a potentially more
expensive computation when fma takes longer to execute than a single
fadd. That hook may be needed for other transforms, but in this case,
we are replacing fmul+fadd with fma, and the fma should never take
longer than the 2 individual instructions.

'contract' FMF is all we need to allow this transform. That flag
corresponds to -ffp-contract=fast in Clang, so we are allowed to form
fma ops freely across expressions.

Differential Revision: https://reviews.llvm.org/D80801

702cf933

[AArch64][SVE] Implement structured load intrinsics · b82be5db

Cullen Rhodes authored May 26, 2020

Summary:
This patch adds initial support for the following instrinsics:

    * llvm.aarch64.sve.ld2
    * llvm.aarch64.sve.ld3
    * llvm.aarch64.sve.ld4

For loading two, three and four vectors worth of data. Basic codegen is
implemented with reg+reg and reg+imm addressing modes being addressed
in a later patch.

The types returned by these intrinsics have a number of elements that is a
multiple of the elements in a 128-bit vector for a given type and N, where N is
the number of vectors being loaded, i.e. 2, 3 or 4. Thus, for 32-bit elements
the types are:

    LD2 : <vscale x 8 x i32>
    LD3 : <vscale x 12 x i32>
    LD4 : <vscale x 16 x i32>

This is implemented with target-specific intrinsics for each variant that take
the same operands as the IR intrinsic but return N values, where the type of
each value is a full vector, i.e. <vscale x 4 x i32> in the above example.
These values are then concatenated using the standard concat_vector intrinsic
to maintain type legality with the IR.

These intrinsics are intended for use in the Arm C Language
Extension (ACLE).

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D75751

b82be5db