Commits · a435e07de85dfe6ef675aa584c6b074ea6cd98d7 · Lorenzo Albano / LLVM bpEVL

Nov 14, 2016

[PPC] Add intrinsic mapping to the xscvhpsp instruction · a435e07d

Sean Fertile authored Nov 14, 2016

add an intrinsic to expose the 'VSX Scalar Convert Half-Precision to
Single-Precision' instruction.

Differential review: https://reviews.llvm.org/D26536

llvm-svn: 286862

a435e07d

[PPC] add intrinsics for vec extract exp/significand and vec test data class. · adda5b2d
Sean Fertile authored Nov 14, 2016
```
  Differential Revision: https://reviews.llvm.org/D26272

llvm-svn: 286829
```
adda5b2d

Nov 11, 2016

[PowerPC] Add remaining vector permute builtins in altivec.h - LLVM portion · ec4b0c36

Nemanja Ivanovic authored Nov 11, 2016

This patch corresponds to review:
https://reviews.llvm.org/D26480

Adds all the intrinsics used for various permute builtins that will
be added to altivec.h.

llvm-svn: 286638

ec4b0c36

[PowerPC] Add vector conversion builtins to altivec.h - LLVM portion · 2efc3cb9

Nemanja Ivanovic authored Nov 11, 2016

This patch corresponds to review:
https://reviews.llvm.org/D26307

Adds all the intrinsics used for various conversion builtins that will
be added to altivec.h. These are type conversions between various types of
vectors.

llvm-svn: 286596

2efc3cb9

Oct 26, 2016

[PowerPC] Implement vec_insert_exp builtins - llvm portion · 0f45998b

Nemanja Ivanovic authored Oct 26, 2016

This revision corresponds to review: https://reviews.llvm.org/D25957.
Committing on behalf of Zaara Syeda.

llvm-svn: 285225

0f45998b

Oct 24, 2016

[PPC] Generate positive FP zero using xor insn instead of loading from constant area · c90b02cf

Ehsan Amiri authored Oct 24, 2016

https://reviews.llvm.org/D23614

Currently we load +0.0 from constant area. That can change to be generated using
XOR instruction.

llvm-svn: 284995

c90b02cf

Oct 04, 2016

[Power9] Exploit D-Form VSX Scalar memory ops that target full VSX register set · 6354d235

Nemanja Ivanovic authored Oct 04, 2016

This patch corresponds to review:

The newly added VSX D-Form (register + offset) memory ops target the upper half
of the VSX register set. The existing ones target the lower half. In order to
unify these and have the ability to target all the VSX registers using D-Form
operations, this patch defines Pseudo-ops for the loads/stores which are
expanded post-RA. The expansion then choses the correct opcode based on the
register that was allocated for the operation.

llvm-svn: 283212

6354d235

[Power9] Part-word VSX integer scalar loads/stores and sign extend instructions · 11049f8f

Nemanja Ivanovic authored Oct 04, 2016

This patch corresponds to review:
https://reviews.llvm.org/D23155

This patch removes the VSHRC register class (based on D20310) and adds
exploitation of the Power9 sub-word integer loads into VSX registers as well
as vector sign extensions.
The new instructions are useful for a few purposes:

    Int to Fp conversions of 1 or 2-byte values loaded from memory
    Building vectors of 1 or 2-byte integers with values loaded from memory
    Storing individual 1 or 2-byte elements from integer vectors

This patch implements all of those uses.

llvm-svn: 283190

11049f8f

Sep 27, 2016

[Power9] Builtins for ELF v.2 API conformance - back end portion · 6f22b413

Nemanja Ivanovic authored Sep 27, 2016

This patch corresponds to review:
https://reviews.llvm.org/D24396

This patch adds support for the "vector count trailing zeroes",
"vector compare not equal" and "vector compare not equal or zero instructions"
as well as "scalar count trailing zeroes" instructions. It also changes the
vector negation to use XXLNOR (when VSX is enabled) so as not to increase
register pressure (previously this was done with a splat immediate of all
ones followed by an XXLXOR). This was done because the altivec.h
builtins (patch to follow) use vector negation and the use of an additional
register for the splat immediate is not optimal.

llvm-svn: 282478

6f22b413

Sep 23, 2016

[Power9] Exploit move and splat instructions for build_vector improvement · d2c3c51a

Nemanja Ivanovic authored Sep 23, 2016

This patch corresponds to review:
https://reviews.llvm.org/D21135

This patch exploits the following instructions:
mtvsrws
lxvwsx
mtvsrdd
mfvsrld

In order to improve some build_vector and extractelement patterns.

llvm-svn: 282246

d2c3c51a

Sep 22, 2016

[PowerPC] Remove LE patterns matching generic stores/loads to VSX permuting ops · e78ffede

Nemanja Ivanovic authored Sep 22, 2016

This patch corresponds to:
https://reviews.llvm.org/D21409

The LXVD2X, LXVW4X, STXVD2X and STXVW4X instructions permute the two doublewords
in the vector register when in little-endian mode. Custom code ensures that the
necessary swaps are inserted for these. This patch simply removes the possibilty
that a load/store node will match one of these instructions in the SDAG as that
would not insert the necessary swaps.

llvm-svn: 282144

e78ffede

[Power9] Add exploitation of non-permuting memory ops · 6e7879c5

Nemanja Ivanovic authored Sep 22, 2016

This patch corresponds to review:
https://reviews.llvm.org/D19825

The new lxvx/stxvx instructions do not require the swaps to line the elements
up correctly. In order to select them over the lxvd2x/lxvw4x instructions which
require swaps, the patterns for the old instruction have a predicate that
ensures they won't be selected on Power9 and newer CPUs.

llvm-svn: 282143

6e7879c5

Aug 18, 2016

[SelectionDAG] Rename fextend -> fpextend, fround -> fpround, frnd -> fround · 2bc3d4d4

Michael Kuperstein authored Aug 18, 2016

The names of the tablegen defs now match the names of the ISD nodes.
This makes the world a slightly saner place, as previously "fround" matched
ISD::FP_ROUND and not ISD::FROUND.

Differential Revision: https://reviews.llvm.org/D23597

llvm-svn: 279129

2bc3d4d4

Jul 18, 2016

[PowerPC] Remove redundant direct moves when extracting integers and converting to FP · d3c284f6

Nemanja Ivanovic authored Jul 18, 2016

This patch corresponds to review:
https://reviews.llvm.org/D21354

We use direct moves for extracting integer elements from vectors. We also use
direct moves when converting integers to FP. When these operations are chained,
we get a direct move out of a VSR followed by a direct move back into a VSR.
These are redundant - all we need to do is line up the element and convert.

llvm-svn: 275796

d3c284f6

Jul 12, 2016

[Power9] Add codegen for VSX word insert/extract instructions · b43bb614

Nemanja Ivanovic authored Jul 12, 2016

This patch corresponds to review:
http://reviews.llvm.org/D20239

It adds exploitation of XXINSERTW and XXEXTRACTUW instructions that
are useful in some cases for inserting and extracting vector elements of
v4[if]32 vectors.

llvm-svn: 275215

b43bb614

[PowerPC] Cannonicalize applicable vector shift immediates as swaps · eebbcb6d

Nemanja Ivanovic authored Jul 12, 2016

This patch corresponds to review:
http://reviews.llvm.org/D21358

Vector shifts that have the same semantics as a vector swap are cannonicalized
as such to provide additional opportunities for swap removal optimization to
remove unnecessary swaps.

llvm-svn: 275168

eebbcb6d

Jul 05, 2016

[PowerPC] - Legalize vector types by widening instead of integer promotion · 44513e54

Nemanja Ivanovic authored Jul 05, 2016

This patch corresponds to review:
http://reviews.llvm.org/D20443

It changes the legalization strategy for illegal vector types from integer
promotion to widening. This only applies for vectors with elements of width
that is a multiple of a byte since we have hardware support for vectors with
1, 2, 3, 8 and 16 byte elements.
Integer promotion for vectors is quite expensive on PPC due to the sequence
of breaking apart the vector, extending the elements and reconstituting the
vector. Two of these operations are expensive.
This patch causes between minor and major improvements in performance on most
benchmarks. There are very few benchmarks whose performance regresses. These
regressions can be handled in a subsequent patch with a DAG combine (similar
to how this patch handles int -> fp conversions of illegal vector types).

llvm-svn: 274535

44513e54

May 04, 2016

[PowerPC] Generate VSX version of splat word · 1a2b2f03

Nemanja Ivanovic authored May 04, 2016

This patch corresponds to review:
http://reviews.llvm.org/D18592

It allows the PPC back end to generate the xxspltw instruction where we
previously only emitted vspltw.

llvm-svn: 268516

1a2b2f03

Mar 31, 2016

[PPC] basic support for Power 9 direct move instructions · 99b017ae

Ehsan Amiri authored Mar 31, 2016

http://reviews.llvm.org/D18097

Initial support does not include any patterns to generate this instructions

llvm-svn: 265031

99b017ae

Mar 28, 2016

[Power9] Implement new vsx instructions: insert, extract, test data class,... · 80722719

Chuang-Yu Cheng authored Mar 28, 2016

[Power9] Implement new vsx instructions: insert, extract, test data class, min/max, reverse, permute, splat

This change implements the following vsx instructions:

- Scalar Insert/Extract
    xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp

- Vector Insert/Extract
    xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp
    xxextractuw xxinsertw

- Scalar/Vector Test Data Class
    xststdcdp xststdcsp xststdcqp
    xvtstdcdp xvtstdcsp

- Maximum/Minimum
    xsmaxcdp xsmaxjdp
    xsmincdp xsminjdp

- Vector Byte-Reverse/Permute/Splat
    xxbrd xxbrh xxbrq xxbrw
    xxperm xxpermr
    xxspltib

30 instructions

Thanks Nemanja for invaluable discussion! Thanks Kit's great help!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

http://reviews.llvm.org/D16842

llvm-svn: 264567

80722719

[Power9] Implement new vsx instructions: quad-precision move, fp-arithmetic · 56638489

Chuang-Yu Cheng authored Mar 28, 2016

This change implements the following vsx instructions:

- quad-precision move
    xscpsgnqp, xsabsqp, xsnegqp, xsnabsqp

- quad-precision fp-arithmetic
    xsaddqp(o) xsdivqp(o) xsmulqp(o) xssqrtqp(o) xssubqp(o)
    xsmaddqp(o) xsmsubqp(o) xsnmaddqp(o) xsnmsubqp(o)

22 instructions

Thanks Nemanja and Kit for careful review and invaluable discussion!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

http://reviews.llvm.org/D16110

llvm-svn: 264565

56638489

Mar 08, 2016

[Power9] Implement new vsx instructions: load, store instructions for vector and scalar · ba532dc8

Kit Barton authored Mar 08, 2016

We follow the comments mentioned in http://reviews.llvm.org/D16842#344378 to
implement this new patch.

This patch implements the following vsx instructions:

Vector load/store:
lxv lxvx lxvb16x lxvl lxvll lxvh8x lxvwsx
stxv stxvb16x stxvh8x stxvl stxvll stxvx
Scalar load/store:
lxsd lxssp lxsibzx lxsihzx
stxsd stxssp stxsibx stxsihx
21 instructions

Phabricator: http://reviews.llvm.org/D16919
llvm-svn: 262906

ba532dc8

Feb 26, 2016

Power9] Implement new vsx instructions: compare and conversion · 93612ec5

Kit Barton authored Feb 26, 2016

This change implements the following vsx instructions:

Quad/Double-Precision Compare:
xscmpoqp xscmpuqp
xscmpexpdp xscmpexpqp
xscmpeqdp xscmpgedp xscmpgtdp xscmpnedp
xvcmpnedp(.) xvcmpnesp(.)
Quad-Precision Floating-Point Conversion
xscvqpdp(o) xscvdpqp
xscvqpsdz xscvqpswz xscvqpudz xscvqpuwz xscvsdqp xscvudqp
xscvdphp xscvhpdp xvcvhpsp xvcvsphp
xsrqpi xsrqpix xsrqpxp
28 instructions

Phabricator: http://reviews.llvm.org/D16709
llvm-svn: 262068

93612ec5

Dec 15, 2015

Bitcasts between FP and INT values using direct moves · 8922476b

Nemanja Ivanovic authored Dec 15, 2015

This patch corresponds to review:
http://reviews.llvm.org/D15286

This patch was meant to land in revision 255246, but I accidentally uploaded
the patch that corresponds to http://reviews.llvm.org/D15372 in that revision
accidentally.

Thereby, this patch is the actual Bitcasts using direct moves patch, whereas
http://reviews.llvm.org/rL255246 actually corresponds to
http://reviews.llvm.org/D15372.

llvm-svn: 255649

8922476b

Dec 11, 2015

Start replacing vector_extract/vector_insert with extractelt/insertelt · fbd9bbfd

Matt Arsenault authored Dec 11, 2015

These are redundant pairs of nodes defined for
INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT.
insertelement/extractelement are slightly closer to the corresponding
C++ node name, and has stricter type checking so prefer it.

Update targets to only use these nodes where it is trivial to do so.
AArch64, ARM, and Mips all have various type errors on simple replacement,
so they will need work to fix.

Example from AArch64:

def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8),
          (i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>;

Which is trying to do sext_inreg i8, i8.

llvm-svn: 255359

fbd9bbfd

Dec 10, 2015

Bitcasts between FP and INT values using direct moves · ac8d01ad

Nemanja Ivanovic authored Dec 10, 2015

This patch corresponds to review:
http://reviews.llvm.org/D15286

LLVM IR frequently contains bitcast operations between floating point and
integer values of the same width. Doing this through memory operations is
quite expensive on PPC. This patch allows the use of direct register moves
between FPRs and GPRs for lowering bitcasts.

llvm-svn: 255246

ac8d01ad

Oct 09, 2015

Vector element extraction without stack operations on Power 8 · d3896573

Nemanja Ivanovic authored Oct 09, 2015

This patch corresponds to review:
http://reviews.llvm.org/D12032

This patch builds onto the patch that provided scalar to vector conversions
without stack operations (D11471).
Included in this patch:

    - Vector element extraction for all vector types with constant element number
    - Vector element extraction for v16i8 and v8i16 with variable element number
    - Removal of some unnecessary COPY_TO_REGCLASS operations that ended up
      unnecessarily moving things around between registers

Not included in this patch (will be in upcoming patch):

    - Vector element extraction for v4i32, v4f32, v2i64 and v2f64 with
      variable element number
    - Vector element insertion for variable/constant element number

Testing is provided for all extractions. The extractions that are not
implemented yet are just placeholders.

llvm-svn: 249822

d3896573

Sep 29, 2015

Addition of interfaces the BE to conform to Table A-2 of ELF V2 ABI V1.1 · 2c84b294

Nemanja Ivanovic authored Sep 29, 2015

This patch corresponds to review:
http://reviews.llvm.org/D13191

Back end portion of the fifth round of additions to altivec.h.

llvm-svn: 248809

2c84b294

Aug 31, 2015

[PowerPC] Fixup SELECT_CC (and SETCC) patterns with i1 comparison operands · a2cdbce6

Hal Finkel authored Aug 30, 2015

There were really two problems here. The first was that we had the truth tables
for signed i1 comparisons backward. I imagine these are not very common, but if
you have:
  setcc i1 x, y, LT
this has the '0 1' and the '1 0' results flipped compared to:
  setcc i1 x, y, ULT
because, in the signed case, '1 0' is really '-1 0', and the answer is not the
same as in the unsigned case.

The second problem was that we did not have patterns (at all) for the unsigned
comparisons select_cc nodes for i1 comparison operands. This was the specific
cause of PR24552. These had to be added (and a missing Altivec promotion added
as well) to make sure these function for all types. I've added a bunch more
test cases for these patterns, and there are a few FIXMEs in the test case
regarding code-quality.

Fixes PR24552.

llvm-svn: 246400

a2cdbce6

Aug 13, 2015

Scalar to vector conversions using direct moves · 1c39ca65

Nemanja Ivanovic authored Aug 13, 2015

This patch corresponds to review:
http://reviews.llvm.org/D11471

It improves the code generated for converting a scalar to a vector value. With
direct moves from GPRs to VSRs, we no longer require expensive stack operations
for this. Subsequent patches will handle the reverse case and more general
operations between vectors and their scalar elements.

llvm-svn: 244921

1c39ca65

Jul 14, 2015

Add missing builtins to the PPC back end for ABI compliance (vol. 4) · 984a3613

Nemanja Ivanovic authored Jul 14, 2015

This patch corresponds to review:
http://reviews.llvm.org/D11183

Back end portion of the fourth round of additions to altivec.h.

llvm-svn: 242167

984a3613

Jul 10, 2015
- NFC. Added a blank line for consistency. · d9e4b4ff
  Nemanja Ivanovic authored Jul 10, 2015
```
llvm-svn: 241913
```
  d9e4b4ff
- Add missing builtins to the PPC back end for ABI compliance (vol. 3) · 5655fb32
  Nemanja Ivanovic authored Jul 10, 2015
```
This patch corresponds to review:
http://reviews.llvm.org/D10973

Back end portion of the third round of additions to altivec.h.

llvm-svn: 241900
```
  5655fb32
Jul 05, 2015

Add missing builtins to the PPC back end for ABI compliance (vol. 2) · d358b8f8

Nemanja Ivanovic authored Jul 05, 2015

This patch corresponds to review:
http://reviews.llvm.org/D10874

Back end portion of the second round of additions to altivec.h.

llvm-svn: 241398

d358b8f8

Jun 26, 2015

Add missing builtins to the PPC back end for ABI compliance (vol. 1) · f502a428

Nemanja Ivanovic authored Jun 26, 2015

This patch corresponds to review:
http://reviews.llvm.org/D10638

This is the back end portion of patch
http://reviews.llvm.org/D10637
It just adds the code gen and intrinsic functions necessary to support that patch to the back end.

llvm-svn: 240820

f502a428

May 29, 2015

Add support for VSX FMA single-precision instructions to the PPC back end · 376e1736

Nemanja Ivanovic authored May 29, 2015

This patch corresponds to review:
http://reviews.llvm.org/D9941

It adds the various FMA instructions introduced in the version 2.07 of
the ISA along with the testing for them. These are operations on single
precision scalar values in VSX registers.

llvm-svn: 238578

376e1736

May 21, 2015

Add support for VSX scalar single-precision arithmetic in the PPC target · f02def6c

Nemanja Ivanovic authored May 21, 2015

http://reviews.llvm.org/D9891
Following up on the VSX single precision loads and stores added earlier, this
adds support for elementary arithmetic operations on single precision values
in VSX registers. These instructions utilize the new VSSRC register class.
Instructions added:
xsaddsp
xsdivsp
xsmulsp
xsresp
xsrsqrtesp
xssqrtsp
xssubsp

llvm-svn: 237937

f02def6c

May 07, 2015

Add VSX Scalar loads and stores to the PPC back end · f3c94b1e

Nemanja Ivanovic authored May 07, 2015

This patch corresponds to review:
http://reviews.llvm.org/D9440

It adds a new register class to the PPC back end to contain single precision
values in VSX registers. Additionally, it adds scalar loads and stores for
VSX registers.

llvm-svn: 236755

f3c94b1e

May 05, 2015

This patch adds ABI support for v1i128 data type. · d4eb73c0

Kit Barton authored May 05, 2015

It adds v1i128 to the appropriate register classes and checks parameter passing
and return values.

This is related to http://reviews.llvm.org/D9081, which will add instructions
that exploit the v1i128 datatype.

Phabricator review: http://reviews.llvm.org/D9475

llvm-svn: 236503

d4eb73c0

Apr 27, 2015

[PPC64LE] Remove unnecessary swaps from lane-insensitive vector computations · fe723b9a

Bill Schmidt authored Apr 27, 2015

This patch adds a new SSA MI pass that runs on little-endian PPC64
code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors
without alignment constraints are accomplished for little-endian using
lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional
xxswapd instructions hurts performance in comparison with big-endian
code, but they are necessary in the general case to support correct
semantics.

However, the general case does not apply to most vector code. Many
vector instructions are lane-insensitive; they do not "care" which
lanes the parallel computations are performed within, provided that
the resulting data is stored into the correct locations. Thus this
pass looks for computations that perform only lane-insensitive
operations, and remove the unnecessary swaps from loads and stores in
such computations.

Future improvements will allow computations using certain
lane-sensitive operations to also be optimized in this manner, by
modifying the lane-sensitive operations to account for the permuted
order of the lanes. However, this patch only adds the infrastructure
to permit this; no lane-sensitive operations are optimized at this
time.

This code is heavily exercised by the various vectorizing applications
in the projects/test-suite tree. For the time being, I have only added
one simple test case to demonstrate what the pass is doing. Although
it is quite simple, it provides coverage for much of the code,
including the special case handling of copies and subreg-to-reg
operations feeding the swaps. I plan to add additional tests in the
future as I fill in more of the "special handling" code.

Two existing tests were affected, because they expected the swaps to
be present, but they are now removed.

llvm-svn: 235910

fe723b9a