Commits · b9d01aa29e5d0aa433c2fc62ace709fe69c45ceb · Lorenzo Albano / LLVM bpEVL

Jul 11, 2018

[Power9] Add remaining __flaot128 builtin support for FMA round to odd · b9d01aa2

Stefan Pintilie authored Jul 11, 2018

Implement this as it is done on GCC:

__float128 a, b, c, d;
a = __builtin_fmaf128_round_to_odd (b, c, d);         // generates xsmaddqpo
a = __builtin_fmaf128_round_to_odd (b, c, -d);        // generates xsmsubqpo
a = - __builtin_fmaf128_round_to_odd (b, c, d);       // generates xsnmaddqpo
a = - __builtin_fmaf128_round_to_odd (b, c, -d);      // generates xsnmsubpqp

Differential Revision: https://reviews.llvm.org/D48218

llvm-svn: 336754

b9d01aa2

[ARM] Treat cmn immediates as legal in isLegalICmpImmediate. · d2c73923

Eli Friedman authored Jul 10, 2018

The original code attempted to do this, but the std::abs() call didn't
actually do anything due to implicit type conversions.  Fix the type
conversions, and perform the correct check for negative immediates.

This probably has very little practical impact, but it's worth fixing
just to avoid confusion in the future, I think.

Differential Revision: https://reviews.llvm.org/D48907

llvm-svn: 336742

d2c73923

[X86] Remove AddedComplexity from all patterns that use X86vzmovl as their root. · 27c77fe4

Craig Topper authored Jul 10, 2018

Some added 20 and some added 15. Its unclear when to use which value and whether they are required at all.

This patch removes them all. If we start finding real world issues we may need to add them back with proper tests.

llvm-svn: 336735

27c77fe4

Fix -Wmismatched-tags warning · d5e57ed9
Richard Trieu authored Jul 10, 2018
```
class -> struct in forward declaration.

llvm-svn: 336733
```
d5e57ed9

[X86] Teach X86InstrInfo::commuteInstructionImpl to use MOVSD/MOVSS for BLEND... · 860ab496

Craig Topper authored Jul 10, 2018

[X86] Teach X86InstrInfo::commuteInstructionImpl to use MOVSD/MOVSS for BLEND under optsize when the immediate allows it.

Isel currently emits movss/movsd a lot of the time and an accidental double commute turns it into a blend.

Ideally we'd select blend directly in isel under optspeed and not rely on the double commute to create blend.

llvm-svn: 336731

860ab496

Jul 10, 2018

[X86] Remove X86ISD::MOVLPS and X86ISD::MOVLPD. NFCI · dea0b88b

Craig Topper authored Jul 10, 2018

These ISD nodes try to select the MOVLPS and MOVLPD instructions which are special load only instructions. They load data and merge it into the lower 64-bits of an XMM register. They are logically equivalent to our MOVSD node plus a load.

There was only one place in X86ISelLowering that used MOVLPD and no places that selected MOVLPS. The one place that selected MOVLPD had to choose between it and MOVSD based on whether there was a load. But lowering is too early to tell if the load can really be folded. So in isel we have patterns that use MOVSD for MOVLPD if we can't find a load.

We also had patterns that select the MOVLPD instruction for a MOVSD if we can find a load, but didn't choose the MOVLPD ISD opcode for some reason.

So it seems better to just standardize on MOVSD ISD opcode and manage MOVSD vs MOVLPD instruction with isel patterns.

llvm-svn: 336728

dea0b88b

[AMDGPU] Fix layering issue with AMDGPUHSAMetadataStreamer (NFC) · 01ce144d
Scott Linder authored Jul 10, 2018
```
llvm-svn: 336722
```
01ce144d
[X86] Remove dead SDNode object from X86InstrFragmentsSIMD.td. NFC · fb302d01
Craig Topper authored Jul 10, 2018
```
It points to an opcode that doesn't exist.

llvm-svn: 336720
```
fb302d01

[X86] Remove AddedComplexity from register form of NOT. NFCI · 04ded1ac

Craig Topper authored Jul 10, 2018

I believe isProfitableToFold will stop the load folding that this was intended to overcome.

Given an (xor load, -1), isProfitableToFold will see that the immediate can be folded with the xor using a one byte immediate since it can be sign extended. It doesn't know about NOT, but the one byte immediate check is enough to stop the fold.

llvm-svn: 336712

04ded1ac

[X86] Remove AddedComplexity from MMX_X86movw2d patterns. · 0f6275ab

Craig Topper authored Jul 10, 2018

There were only 3 patterns with this node as a root and they all the same AddedComplexity. So this doesn't really do anything.

llvm-svn: 336711

0f6275ab

[AMDGPU] Refactor HSAMetadataStream::emitKernel (NFC) · 2ad2c18b

Scott Linder authored Jul 10, 2018

Move all metadata construction into AMDGPUHSAMetadataStreamer.

Differential Revision: https://reviews.llvm.org/D48176

llvm-svn: 336707

2ad2c18b

[GlobalISel][X86_64] Support for G_SITOFP · 48ca0550
Alexander Ivchenko authored Jul 10, 2018
```
The instruction selection is automatically handled by tablegen

llvm-svn: 336703
```
48ca0550

AMDGPU: Make hidden argument metadata consistent with · f0badd5a

Konstantin Zhuravlyov authored Jul 10, 2018

amdgpu-implicitarg-num-bytes attribute

Differential Revision: https://reviews.llvm.org/D49096

llvm-svn: 336697

f0badd5a

[AArch64][SVE] Asm: Support for predicated unary operations. · 53108d48

Sander de Smalen authored Jul 10, 2018

This patch adds support for the following instructions:
  CLS  (Count Leading Sign bits)
  CLZ  (Count Leading Zeros)
  CNT  (Count non-zero bits)
  CNOT (Logically invert boolean condition in vector)
  NOT  (Bitwise invert vector)
  FABS (Floating-point absolute value)
  FNEG (Floating-point negate)

All operations are predicated and unary, e.g.
  clz  z0.s, p0/m, z1.s

- CLS, CLZ, CNT, CNOT and NOT have variants for 8, 16, 32
  and 64 bit elements.

- FABS and FNEG have variants for 16, 32 and 64 bit elements.

llvm-svn: 336677

53108d48

Reapply "AMDGPU: Force inlining if LDS global address is used" · a680199a
Matt Arsenault authored Jul 10, 2018
```
This reverts commit r336623

llvm-svn: 336675
```
a680199a

[Hexagon] Add implicit uses even when untied explicit uses are present · c052451a

Krzysztof Parzyszek authored Jul 10, 2018

An explicit untied use is not sufficient to maintain liveness of a
register redefined in a predicated instruction. For example
  %1 = COPY %0
  ...
  %1 = A2_paddif %2, %1, 1
could become
  $r1 = COPY $r0
  ...
  $r1 = A2_paddif $p0, $r1, 1
and later
  $r1 = COPY $r0                ;; this is not really dead!
  ...
  $r1 = A2_paddif $p0, $r0, 1

llvm-svn: 336662

c052451a

[X86][SSE] Prefer BLEND(SHL(v,c1),SHL(v,c2)) over MUL(v, c3) · d32ca2c0

Simon Pilgrim authored Jul 10, 2018

Now that rL336250 has landed, we should prefer 2 immediate shifts + a shuffle blend over performing a multiply. Despite the increase in instructions, this is quicker (especially for slow v4i32 multiplies), avoid loads and constant pool usage. It does mean however that we increase register pressure. The code size will go up a little but by less than what we save on the constant pool data.

This patch also adds support for v16i16 to the BLEND(SHIFT(v,c1),SHIFT(v,c2)) combine, and also prevents blending on pre-SSE41 shifts if it would introduce extra blend masks/constant pool usage.

Differential Revision: https://reviews.llvm.org/D48936

llvm-svn: 336642

d32ca2c0

[X86] Use IsProfitableToFold to block vinsertf128rm in favor of insert_subreg... · 08b81a55

Craig Topper authored Jul 10, 2018

[X86] Use IsProfitableToFold to block vinsertf128rm in favor of insert_subreg instead of artifically increasing pattern complexity to give priority.

This is a much more direct way to solve the issue than just giving extra priority.

llvm-svn: 336639

08b81a55

[X86] Remove some seemingly unnecessary patterns. · db73f564

Craig Topper authored Jul 10, 2018

We're missing the EVEX equivalents of these patterns and seem to get along fine.

I think we end up with X86vzload for the obvious IR cases that would produce this DAG.

llvm-svn: 336638

db73f564

[X86] Correct vfixupimm load patterns to look for an integer load, not a... · 866a377e

Craig Topper authored Jul 10, 2018

[X86] Correct vfixupimm load patterns to look for an integer load, not a floating point load bitcasted to integer.

DAG combine wouldn't let a floating point load bitcasted to integer exist. It would just be an integer load.

llvm-svn: 336626

866a377e

[X86] Remove FloatVT from X86VectorVTInfo in X86InstrAVX512.td · e4f46e4f

Craig Topper authored Jul 10, 2018

The only places it was used where places where VT was the same as FloatVT. So switch those uses to VT and drop it.

llvm-svn: 336624

e4f46e4f

Revert "AMDGPU: Force inlining if LDS global address is used" · 688e7522
Vlad Tsyrklevich authored Jul 10, 2018
```
This reverts commit r336587, it was causing test failures on the
sanitizer bots.

llvm-svn: 336623
```
688e7522

[WebAssembly] Support for binary atomic RMW instructions · fed7382e

Heejin Ahn authored Jul 09, 2018

Summary:
This adds support for binary atomic read-modify-write instructions:
add, sub, and, or, xor, and xchg.

This does not yet support translations of some of LLVM IR atomicrmw
instructions (nand, max, min, umax, and umin) that do not have a direct
counterpart in wasm instructions.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49088

llvm-svn: 336615

fed7382e

Jul 09, 2018

[Power9] Add __float128 builtins for Rounding Operations · 133acb22

Stefan Pintilie authored Jul 09, 2018

Added __float128 support for a number of rounding operations:

trunc
rint
nearbyint
round
floor
ceil

Differential Revision: https://reviews.llvm.org/D48415

llvm-svn: 336601

133acb22

[WebAssembly] Improve readability of load/stores and tests. NFC. · d31bc986

Heejin Ahn authored Jul 09, 2018

Summary:
- Changed variable/function names to be more consistent
- Improved comments in test files
- Added more tests
- Fixed a few typos
- Misc. cosmetic changes

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49087

llvm-svn: 336598

d31bc986

[Power9] [LLVM] Add __float128 support for trunc to double round to odd · 58e3e0a8

Stefan Pintilie authored Jul 09, 2018

Add support for this builtin:
double builtin_truncf128_round_to_odd(float128)

Differential Revision: https://reviews.llvm.org/D48483

llvm-svn: 336595

58e3e0a8

[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error · 5bfd8d89

Mark Searles authored Jul 09, 2018

Build error on Android; reported by and fix provided by (thanks) by Mauro Rossi <issor.oruam@gmail.com>

Fixes the following building error:

external/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1903:61:
error: comparison of integers of different signs:
'typename iterator_traits<__wrap_iter<MachineBasicBlock **> >::difference_type'
(aka 'int') and 'unsigned int' [-Werror,-Wsign-compare]
                      BlockWaitcntProcessedSet.end(), &MBB) < Count)) {
                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~
1 error generated.

Differential Revision: https://reviews.llvm.org/D49089

llvm-svn: 336588

5bfd8d89

AMDGPU: Force inlining if LDS global address is used · 40cb6cab

Matt Arsenault authored Jul 09, 2018

These won't work for the forseeable future. These aren't allowed
from OpenCL, but IPO optimizations can make them appear.

Also directly set the attributes on functions, regardless
of the linkage rather than cloning functions like before.

llvm-svn: 336587

40cb6cab

[X86][TLI] DAGCombine: Unfold variable bit-clearing mask to two shifts. · 5ccae175

Roman Lebedev authored Jul 09, 2018

Summary:
This adds a reverse transform for the instcombine canonicalizations
that were added in D47980, D47981.

As discussed later, that was worse at least for the code size,
and potentially for the performance, too.

https://rise4fun.com/Alive/Zmpl

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: spatel

Subscribers: reames, llvm-commits

Differential Revision: https://reviews.llvm.org/D48768

llvm-svn: 336585

5ccae175

[Power9] Add __float128 builtins for Round To Odd · 83a5fe14

Stefan Pintilie authored Jul 09, 2018

GCC has builtins for these round to odd instructions:

__float128 __builtin_sqrtf128_round_to_odd (__float128)
__float128 __builtin_{add,sub,mul,div}f128_round_to_odd (__float128, __float128)
__float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128)

Differential Revision: https://reviews.llvm.org/D47550

llvm-svn: 336578

83a5fe14

[X86] In combineFMA, make sure we bitcast the result of isFNEG back the... · 47170b31

Craig Topper authored Jul 09, 2018

[X86] In combineFMA, make sure we bitcast the result of isFNEG back the expected type before creating the new FMA node.

Previously, we were creating malformed SDNodes, but nothing noticed because the type constraints prevented isel from noticing.

llvm-svn: 336566

47170b31

[X86] Remove some patterns that include a bitcast of a floating point load to an integer type. · e9cff7d4
Craig Topper authored Jul 09, 2018
```
DAG combine should have converted the type of the load.

llvm-svn: 336557
```
e9cff7d4

[X86] Remove some patterns that seems to be unreachable. · 16ee4b49

Craig Topper authored Jul 09, 2018

These patterns mapped (v2f64 (X86vzmovl (v2f64 (scalar_to_vector FR64:$src)))) to a MOVSD and an zeroing XOR. But the complexity of a pattern for (v2f64 (X86vzmovl (v2f64))) that selects MOVQ is artificially and hides this MOVSD pattern.

Weirder still, the SSE version of the pattern was explicitly blocked on SSE41, but yet we had copied it to AVX and AVX512.

llvm-svn: 336556

16ee4b49

[X86] Remove some seemingly unnecessary AddedComplexity lines. · 22330c70

Craig Topper authored Jul 09, 2018

Looking at the generated tables this didn't seem to make an obvious difference in pattern priority.

llvm-svn: 336555

22330c70

[AArch64][SVE] Asm: Support for CNT(B|H|W|D) and CNTP instructions. · d3efb59f

Sander de Smalen authored Jul 09, 2018

This patch adds support for the following instructions:

  CNTB CNTH - Determine the number of active elements implied by
  CNTW CNTD   the named predicate constant, multiplied by an
              immediate, e.g.

                cnth x0, vl8, #16

  CNTP      - Count active predicate elements, e.g.
                cntp  x0, p0, p1.b

              counts the number of active elements in p1, predicated
              by p0, and stores the result in x0.

llvm-svn: 336552

d3efb59f

[Power9] Add __float128 support for compare operations · 3d76326d

Stefan Pintilie authored Jul 09, 2018

Added handling for the select f128.

Differential Revision: https://reviews.llvm.org/D48294

llvm-svn: 336548

3d76326d

[AArch64][SVE] Asm: Support for remaining shift instructions. · 813b21e3

Sander de Smalen authored Jul 09, 2018

This patch completes support for shifts, which include:
- LSL   - Logical Shift Left
- LSLR  - Logical Shift Left, Reversed form
- LSR   - Logical Shift Right
- LSRR  - Logical Shift Right, Reversed form
- ASR   - Arithmetic Shift Right
- ASRR  - Arithmetic Shift Right, Reversed form
- ASRD  - Arithmetic Shift Right for Divide

In the following variants:

- Predicated shift by immediate - ASR, LSL, LSR, ASRD
  e.g.
    asr z0.h, p0/m, z0.h, #1

  (active lanes of z0 shifted by #1)

- Unpredicated shift by immediate - ASR, LSL*, LSR*
  e.g.
    asr z0.h, z1.h, #1

  (all lanes of z1 shifted by #1, stored in z0)

- Predicated shift by vector - ASR, LSL*, LSR*
  e.g.
    asr z0.h, p0/m, z0.h, z1.h

  (active lanes of z0 shifted by z1, stored in z0)

- Predicated shift by vector, reversed form - ASRR, LSLR, LSRR
  e.g.
    lslr z0.h, p0/m, z0.h, z1.h

  (active lanes of z1 shifted by z0, stored in z0)

- Predicated shift left/right by wide vector - ASR, LSL, LSR
  e.g.
    lsl z0.h, p0/m, z0.h, z1.d

  (active lanes of z0 shifted by wide elements of vector z1)

- Unpredicated shift left/right by wide vector - ASR, LSL, LSR
  e.g.
    lsl z0.h, z1.h, z2.d

  (all lanes of z1 shifted by wide elements of z2, stored in z0)

*Variants added in previous patches.

llvm-svn: 336547

813b21e3

[mips] Addition of the [d]rem and [d]remu instructions · 0a23998f

Stefan Maksimovic authored Jul 09, 2018

Related to http://reviews.llvm.org/D15772
Depends on http://reviews.llvm.org/D16889
Adds [D]REM[U] instructions.

Patch By: Srdjan Obucina
Contributions from: Simon Dardis

Differential Revision: https://reviews.llvm.org/D17036

llvm-svn: 336545

0a23998f

[AArch64][SVE] Asm: Support for TBL instruction. · 54077dcf

Sander de Smalen authored Jul 09, 2018

Support for SVE's TBL instruction for programmable table
lookup/permute using vector of element indices, e.g.

  tbl  z0.d, { z1.d }, z2.d

stores elements from z1, indexed by elements from z2, into z0.

llvm-svn: 336544

54077dcf

[AArch64][SVE] Asm: Support for ADR instruction. · c69944c6

Sander de Smalen authored Jul 09, 2018

Supporting various addressing modes:
- adr z0.s, [z0.s, z0.s]
- adr z0.s, [z0.s, z0.s, lsl #<shift>]
- adr z0.d, [z0.d, z0.d]
- adr z0.d, [z0.d, z0.d, lsl #<shift>]
- adr z0.d, [z0.d, z0.d, uxtw #<shift>]
- adr z0.d, [z0.d, z0.d, sxtw #<shift>]

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D48870

llvm-svn: 336533

c69944c6