Commits · 83a5fe146e0728a7e61c6f17b06a3a963c5b7e00 · Lorenzo Albano / LLVM bpEVL

Jul 09, 2018

[Power9] Add __float128 builtins for Round To Odd · 83a5fe14

Stefan Pintilie authored Jul 09, 2018

GCC has builtins for these round to odd instructions:

__float128 __builtin_sqrtf128_round_to_odd (__float128)
__float128 __builtin_{add,sub,mul,div}f128_round_to_odd (__float128, __float128)
__float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128)

Differential Revision: https://reviews.llvm.org/D47550

llvm-svn: 336578

83a5fe14

[X86] In combineFMA, make sure we bitcast the result of isFNEG back the... · 47170b31

Craig Topper authored Jul 09, 2018

[X86] In combineFMA, make sure we bitcast the result of isFNEG back the expected type before creating the new FMA node.

Previously, we were creating malformed SDNodes, but nothing noticed because the type constraints prevented isel from noticing.

llvm-svn: 336566

47170b31

[X86] Remove some patterns that include a bitcast of a floating point load to an integer type. · e9cff7d4
Craig Topper authored Jul 09, 2018
```
DAG combine should have converted the type of the load.

llvm-svn: 336557
```
e9cff7d4

[X86] Remove some patterns that seems to be unreachable. · 16ee4b49

Craig Topper authored Jul 09, 2018

These patterns mapped (v2f64 (X86vzmovl (v2f64 (scalar_to_vector FR64:$src)))) to a MOVSD and an zeroing XOR. But the complexity of a pattern for (v2f64 (X86vzmovl (v2f64))) that selects MOVQ is artificially and hides this MOVSD pattern.

Weirder still, the SSE version of the pattern was explicitly blocked on SSE41, but yet we had copied it to AVX and AVX512.

llvm-svn: 336556

16ee4b49

[X86] Remove some seemingly unnecessary AddedComplexity lines. · 22330c70

Craig Topper authored Jul 09, 2018

Looking at the generated tables this didn't seem to make an obvious difference in pattern priority.

llvm-svn: 336555

22330c70

[AArch64][SVE] Asm: Support for CNT(B|H|W|D) and CNTP instructions. · d3efb59f

Sander de Smalen authored Jul 09, 2018

This patch adds support for the following instructions:

  CNTB CNTH - Determine the number of active elements implied by
  CNTW CNTD   the named predicate constant, multiplied by an
              immediate, e.g.

                cnth x0, vl8, #16

  CNTP      - Count active predicate elements, e.g.
                cntp  x0, p0, p1.b

              counts the number of active elements in p1, predicated
              by p0, and stores the result in x0.

llvm-svn: 336552

d3efb59f

[Power9] Add __float128 support for compare operations · 3d76326d

Stefan Pintilie authored Jul 09, 2018

Added handling for the select f128.

Differential Revision: https://reviews.llvm.org/D48294

llvm-svn: 336548

3d76326d

[AArch64][SVE] Asm: Support for remaining shift instructions. · 813b21e3

Sander de Smalen authored Jul 09, 2018

This patch completes support for shifts, which include:
- LSL   - Logical Shift Left
- LSLR  - Logical Shift Left, Reversed form
- LSR   - Logical Shift Right
- LSRR  - Logical Shift Right, Reversed form
- ASR   - Arithmetic Shift Right
- ASRR  - Arithmetic Shift Right, Reversed form
- ASRD  - Arithmetic Shift Right for Divide

In the following variants:

- Predicated shift by immediate - ASR, LSL, LSR, ASRD
  e.g.
    asr z0.h, p0/m, z0.h, #1

  (active lanes of z0 shifted by #1)

- Unpredicated shift by immediate - ASR, LSL*, LSR*
  e.g.
    asr z0.h, z1.h, #1

  (all lanes of z1 shifted by #1, stored in z0)

- Predicated shift by vector - ASR, LSL*, LSR*
  e.g.
    asr z0.h, p0/m, z0.h, z1.h

  (active lanes of z0 shifted by z1, stored in z0)

- Predicated shift by vector, reversed form - ASRR, LSLR, LSRR
  e.g.
    lslr z0.h, p0/m, z0.h, z1.h

  (active lanes of z1 shifted by z0, stored in z0)

- Predicated shift left/right by wide vector - ASR, LSL, LSR
  e.g.
    lsl z0.h, p0/m, z0.h, z1.d

  (active lanes of z0 shifted by wide elements of vector z1)

- Unpredicated shift left/right by wide vector - ASR, LSL, LSR
  e.g.
    lsl z0.h, z1.h, z2.d

  (all lanes of z1 shifted by wide elements of z2, stored in z0)

*Variants added in previous patches.

llvm-svn: 336547

813b21e3

[mips] Addition of the [d]rem and [d]remu instructions · 0a23998f

Stefan Maksimovic authored Jul 09, 2018

Related to http://reviews.llvm.org/D15772
Depends on http://reviews.llvm.org/D16889
Adds [D]REM[U] instructions.

Patch By: Srdjan Obucina
Contributions from: Simon Dardis

Differential Revision: https://reviews.llvm.org/D17036

llvm-svn: 336545

0a23998f

[AArch64][SVE] Asm: Support for TBL instruction. · 54077dcf

Sander de Smalen authored Jul 09, 2018

Support for SVE's TBL instruction for programmable table
lookup/permute using vector of element indices, e.g.

  tbl  z0.d, { z1.d }, z2.d

stores elements from z1, indexed by elements from z2, into z0.

llvm-svn: 336544

54077dcf

[AArch64][SVE] Asm: Support for ADR instruction. · c69944c6

Sander de Smalen authored Jul 09, 2018

Supporting various addressing modes:
- adr z0.s, [z0.s, z0.s]
- adr z0.s, [z0.s, z0.s, lsl #<shift>]
- adr z0.d, [z0.d, z0.d]
- adr z0.d, [z0.d, z0.d, lsl #<shift>]
- adr z0.d, [z0.d, z0.d, uxtw #<shift>]
- adr z0.d, [z0.d, z0.d, sxtw #<shift>]

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D48870

llvm-svn: 336533

c69944c6

[AArch64][SVE] Asm: Support for UZP and TRN instructions. · bd513b42

Sander de Smalen authored Jul 09, 2018

This patch adds support for:
  UZP1  Concatenate even elements from two vectors
  UZP2  Concatenate  odd elements from two vectors
  TRN1  Interleave  even elements from two vectors
  TRN2  Interleave   odd elements from two vectors

With variants for both data and predicate vectors, e.g.
  uzp1    z0.b, z1.b, z2.b
  trn2    p0.s, p1.s, p2.s

llvm-svn: 336531

bd513b42

[X86] Improve the message for some asserts. Remove an if that is guaranteed true by said asserts. · b8145ec6

Craig Topper authored Jul 09, 2018

This replaces some asserts in lowerV2F64VectorShuffle with the similar asserts from lowerVIF64VectorShuffle which are more readable. The original asserts mentioned a blend, but there's no guarantee that it is a blend.

Also remove an if that the asserts prove is always true. Mask[0] is always less than 2 and Mask[1] is always at least 2. Therefore (Mask[0] >= 2) + (Mask[1] >= 2) == 1 must wlays be true.

llvm-svn: 336517

b8145ec6

[X86] Remove an AddedComplexity line that seems unnecessary. · c98c675f

Craig Topper authored Jul 08, 2018

It only existed on SSE and AVX version. AVX512 version didn't have it.

I checked the generated table and this didn't seem necessary to creat a match preference.

llvm-svn: 336516

c98c675f

Jul 08, 2018

[X86][Nearly NFC] Split SHLD/SHRD into their own WriteShiftDouble class · 75ce4537

Roman Lebedev authored Jul 08, 2018

Summary:
{F6603964}
While there is still some discrepancies within that new group,
it is clearly separate from the other shifts.
And Agner's tables agree, these double shifts are clearly
different from the normal shifts/rotates.

I'm guessing `FeatureSlowSHLD` is related.

Indeed, a basic sched pair is *not* the /best/ match.
But keeping it in the WriteShift is /clearly/ not ideal either.
This can and likely will be fine-tuned later.

This is purely mechanical change, it does not change any numbers,
as the [lack of the change of] mca tests show.

Reviewers: craig.topper, RKSimon, andreadb

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49015

llvm-svn: 336515

75ce4537

[X86] Enhance combineFMA to look for FNEG behind an EXTRACT_VECTOR_ELT. · 9e17073c
Craig Topper authored Jul 08, 2018
```
llvm-svn: 336514
```
9e17073c

[X86][SSE] Combine v16i8 SHL by constants to multiplies · 2eced71e

Simon Pilgrim authored Jul 08, 2018

Pre-AVX512 (which can perform a quick extend/shift/truncate), extending to 2 v8i16 for the PMULLW and then truncating is more performant than relying on the generic PBLENDVB vXi8 shift path and uses a similar amount of mask constant pool data.

Differential Revision: https://reviews.llvm.org/D48963

llvm-svn: 336513

2eced71e

[X86] Set scheduler classes to unsupported. NFCI. · 1795870b

Simon Pilgrim authored Jul 08, 2018

While looking at PR36895 I noticed how much of the atom model was still setting schedules for unsupported SSE4+ instructions.

llvm-svn: 336512

1795870b

[X86][Basically NFC] Sched: split WriteBitScan into WriteBSF/WriteBSR. · fa988853

Roman Lebedev authored Jul 08, 2018

Summary:
Motivation: {F6597954}

This only does the mechanical splitting, does not actually change
any numbers, as the tests added in previous revision show.

Reviewers: craig.topper, RKSimon, courbet

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48998

llvm-svn: 336511

fa988853

[X86] Add back some intrinsic table entries lost in r336506. · f1a981c7
Craig Topper authored Jul 08, 2018
```
llvm-svn: 336508
```
f1a981c7

[X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types. · fdf3f1ff

Craig Topper authored Jul 08, 2018

This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma.

I had to add new rounding mode CodeGenOnly instructions to support isel when we can't find a movss to grab the upper bits from to use the b_Int instruction.

Fast-isel tests have been updated to match new clang codegen.

We are currently having trouble folding fneg into the new intrinsic. I'm going to correct that in a follow up patch to keep the size of this one down.

A future patch will also remove the old intrinsics.

llvm-svn: 336506

fdf3f1ff

Jul 07, 2018

[SelectionDAG] Split float and integer isKnownNeverZero tests · 23f9edda

Simon Pilgrim authored Jul 07, 2018

Splits off isKnownNeverZeroFloat to handle +/- 0 float cases.

This will make it easier to be more aggressive with the integer isKnownNeverZero tests (similar to ValueTracking), use computeKnownBits etc.

Differential Revision: https://reviews.llvm.org/D48969

llvm-svn: 336492

23f9edda

[CostModel][X86] Add SREM/UREM general and constant costs (PR38056) · dc113dc7

Simon Pilgrim authored Jul 07, 2018

We penalize general SDIV/UDIV costs but don't do the same for SREM/UREM.

This patch makes general vector SREM/UREM x20 as costly as scalar, the same approach as we do for SDIV/UDIV. The patch also extends the existing SDIV/UDIV constant costs for SREM/UREM - at the moment this means the additional cost of a MUL+SUB (see D48975).

Differential Revision: https://reviews.llvm.org/D48980

llvm-svn: 336486

dc113dc7

[MachineOutliner] Assert that Liveness tracking is accurate (NFC) · a96a0455

Yvan Roux authored Jul 07, 2018

The checking is done deeper inside MachineBasicBlock, but this will
hopefully help to find issues when porting the machine outliner to a
target where Liveness tracking is broken (like ARM).

Differential Revision: https://reviews.llvm.org/D49023

llvm-svn: 336481

a96a0455

[X86] Merge INTR_TYPE_3OP_RM with INTR_TYPE_3OP. Remove unused INTR_TYPE_1OP_RM. · 2c27e33a
Craig Topper authored Jul 07, 2018
```
llvm-svn: 336476
```
2c27e33a

Jul 06, 2018

Use Type::isIntOrPtrTy where possible, NFC · b3091da3

Vedant Kumar authored Jul 06, 2018

It's a bit neater to write T.isIntOrPtrTy() over `T.isIntegerTy() ||
T.isPointerTy()`.

I used Python's re.sub with this regex to update users:

  r'([\w.\->()]+)isIntegerTy\(\)\s*\|\|\s*\1isPointerTy\(\)'

llvm-svn: 336462

b3091da3

[X86] Remove patterns for MOVLPD/MOVLPS nodes with integer types. · f61c631b

Craig Topper authored Jul 06, 2018

Lowering shouldn't generate these. If we need to use them for integer types, it should use a bitcast.

llvm-svn: 336458

f61c631b

[X86] Add more FMA3 memory folding patterns. Remove patterns that are no longer needed. · 77edbffa

Craig Topper authored Jul 06, 2018

We've removed the legacy FMA3 intrinsics and are now using llvm.fma and extractelement/insertelement. So we don't need patterns for the nodes that could only be created by the old intrinscis. Those ISD opcodes still exist because we haven't dropped the AVX512 intrinsics yet, but those should go to EVEX instructions.

llvm-svn: 336457

77edbffa

AMDGPU: Fix UBSan error caused by r335942 · ec4feae1

Tom Stellard authored Jul 06, 2018

Summary: Fixes PR38071.

Reviewers: arsenm, dstenb

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48979

llvm-svn: 336448

ec4feae1

[ARM] ParallelDSP: added statistics, NFC. · b3e06faa

Sjoerd Meijer authored Jul 06, 2018

Added statistics for the number of SMLAD instructions created, and
als renamed the pass name to -arm-parallel-dsp.

Differential Revision: https://reviews.llvm.org/D48971

llvm-svn: 336441

b3e06faa

[AArch64] Armv8.4-A: TLB support · 35bd8f5d

Sjoerd Meijer authored Jul 06, 2018

This adds:
- outer shareable TLB Maintenance instructions, and
- TLB range maintenance instructions.

llvm-svn: 336434

35bd8f5d

Recommit: [AArch64] Armv8.4-A: Flag manipulation instructions · a3dad801
Sjoerd Meijer authored Jul 06, 2018
```
Now with the asm operand definition included.

llvm-svn: 336432
```
a3dad801
Revert [AArch64] Armv8.4-A: Flag manipulation instructions · 8203177e
Sjoerd Meijer authored Jul 06, 2018
```
It's causing build errors.

llvm-svn: 336422
```
8203177e

[AArch64] Armv8.4-A: Flag manipulation instructions · 6f5f6d5b

Sjoerd Meijer authored Jul 06, 2018

These instructions are added to AArch64 only.

Differential Revision: https://reviews.llvm.org/D48926

llvm-svn: 336421

6f5f6d5b

[AArch64][ARM] Armv8.4-A: Trace synchronization barrier instruction · 2a57b357

Sjoerd Meijer authored Jul 06, 2018

This adds the Armv8.4-A Trace synchronization barrier (TSB) instruction.

Differential Revision: https://reviews.llvm.org/D48918

llvm-svn: 336418

2a57b357

[X86] Remove FMA4 scalar intrinsics. Use llvm.fma intrinsic instead. · c60e1807

Craig Topper authored Jul 06, 2018

The intrinsics can be implemented with a f32/f64 llvm.fma intrinsic and an insert into a zero vector.

There are a couple regressions here due to SelectionDAG not being able to pull an fneg through an extract_vector_elt. I'm not super worried about this though as InstCombine should be able to do it before we get to SelectionDAG.

llvm-svn: 336416

c60e1807

[X86] Remove all of the avx512 masked packed fma intrinsics. Use llvm.fma or... · 7b35585f

Craig Topper authored Jul 06, 2018

[X86] Remove all of the avx512 masked packed fma intrinsics. Use llvm.fma or unmasked 512-bit intrinsics with rounding mode.

This upgrades all of the intrinsics to use fneg instructions to convert fma into fmsub/fnmsub/fnmadd/fmsubadd. And uses a select instruction for masking.

This matches how clang uses the intrinsics these days.

llvm-svn: 336409

7b35585f

[Power9] Add __float128 library call for frem · b351f09c

Stefan Pintilie authored Jul 06, 2018

Power 9 does not have a hardware instruction for frem but we can call fmodf128.

Differential Revision: https://reviews.llvm.org/D48552

llvm-svn: 336406

b351f09c

[X86][Disassembler] Fix LOCK prefix disassembler support · 89e4abe7

Maksim Panchenko authored Jul 05, 2018

Summary:
If LOCK prefix is not the first prefix in an instruction, LLVM
disassembler silently drops the prefix.

The fix is to select a proper instruction with a builtin LOCK prefix if
one exists.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49001

llvm-svn: 336400

89e4abe7

Jul 05, 2018

[WebAssembly] Add missing _S opcodes of atomic stores to InstPrinter · 80d9f170

Heejin Ahn authored Jul 05, 2018

Summary: This was missing in D48839 (rL336145).

Reviewers: aardappel

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D48992

llvm-svn: 336390

80d9f170