Commits · 54dbcfe5f017265ae829f054f28745b2680a0fa2 · Roger Ferrer / llvm-epi

Apr 29, 2019

Fix additional cases of more that two dashes for options in tests. · 54dbcfe5
Don Hinton authored Apr 29, 2019
```
llvm-svn: 359484
```
54dbcfe5

[DAG] Refactor DAGCombiner::ReassociateOps · 82099457

Bjorn Pettersson authored Apr 29, 2019

Summary:
Extract the logic for doing reassociations
from DAGCombiner::reassociateOps into a helper
function DAGCombiner::reassociateOpsCommutative,
and use that helper to trigger reassociation
on the original operand order, or the commuted
operand order.

Codegen is not identical since the operand order will
be different when doing the reassociations for the
commuted case. That causes some unfortunate churn in
some test cases. Apart from that this should be NFC.

Reviewers: spatel, craig.topper, tstellar

Reviewed By: spatel

Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61199

llvm-svn: 359476

82099457

Add AVX support to this test. · a25c9283
Kevin P. Neal authored Apr 29, 2019
```
Requested by Craig Topper and Andrew Kaylor as part of D55897.

llvm-svn: 359461
```
a25c9283
[X86][SSE] Add scalar horizontal add/sub tests for non-0/1 element extractions · 9d4ed24f
Simon Pilgrim authored Apr 29, 2019
```
llvm-svn: 359454
```
9d4ed24f
[X86][SSE] Moved haddps test from phaddsub.ll to haddsub.ll (D61245) · c570b2a2
Simon Pilgrim authored Apr 29, 2019
```
Also merged duplicate PR39921 + PR39936 tests

llvm-svn: 359437
```
c570b2a2

[ARM] Add bitcast/extract_subvec. of fp16 vectors · d95abb17

Diogo N. Sampaio authored Apr 29, 2019

Summary:
This patch adds some basic operations for fp16
vectors, such as bitcast from fp16 to i16,
required to perform extract_subvector (also added
here) and extract_element.

Reviewers: SjoerdMeijer, DavidSpickett, t.p.northover, ostannard

Reviewed By: ostannard

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60618

llvm-svn: 359433

d95abb17

[ARM] Add v4f16 and v8f16 types to the CallingConv · 2078eb74

Diogo N. Sampaio authored Apr 29, 2019

Summary:
The Procedure Call Standard for the Arm Architecture
states that float16x4_t and float16x8_t behave just
as uint16x4_t and uint16x8_t for argument passing.
This patch adds the fp16 vectors to the
ARMCallingConv.td file.

Reviewers: miyuki, ostannard

Reviewed By: ostannard

Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60720

llvm-svn: 359431

2078eb74

Apr 28, 2019

[X86] Add PR39921 HADD pairwise reduction test and AVX2 test coverage · 65f12f66
Simon Pilgrim authored Apr 28, 2019
```
llvm-svn: 359409
```
65f12f66
[X86][AVX] Add fast-hops target for add/fadd reduction tests · 85bacd0f
Simon Pilgrim authored Apr 28, 2019
```
llvm-svn: 359408
```
85bacd0f
[X86] Add PR39936 HADD Tests · e375257e
Simon Pilgrim authored Apr 28, 2019
```
llvm-svn: 359407
```
e375257e
[X86][AVX] Enabled AVX512F tests and add PR40815 test case · d3941952
Simon Pilgrim authored Apr 28, 2019
```
llvm-svn: 359401
```
d3941952

[X86][AVX] Combine non-lane crossing binary shuffles using X86ISD::VPERMV3 · 22d1476b

Simon Pilgrim authored Apr 28, 2019

Some of the combines might be further improved if we lower more shuffles with X86ISD::VPERMV3 directly, instead of waiting to combine the results.

llvm-svn: 359400

22d1476b

[SelectionDAG] include FP min/max variants as binary operators · ce8cfe96

Sanjay Patel authored Apr 28, 2019

The x86 test diffs don't look great because of extra move ops,
but FP min/max should clearly be included in the list.

llvm-svn: 359399

ce8cfe96

[DAGCombiner] try repeated fdiv divisor transform before building estimate · fb9a5307

Sanjay Patel authored Apr 28, 2019

This was originally part of D61028, but it's an independent diff.

If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.

The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.

Differential Revision: https://reviews.llvm.org/D61149

llvm-svn: 359398

fb9a5307

[X86][SSE] Optimize llvm.experimental.vector.reduce.xor.vXi1 parity reduction (PR38840) · 93ad4821

Simon Pilgrim authored Apr 28, 2019

An xor reduction of a bool vector can be optimized to a parity check of the MOVMSK/BITCAST'd integer - if the population count is odd return 1, else return 0.

Differential Revision: https://reviews.llvm.org/D61230

llvm-svn: 359396

93ad4821

[X86][AVX] Add AVX512DQ coverage for masked memory ops tests (PR34584) · fed302ae
Simon Pilgrim authored Apr 28, 2019
```
llvm-svn: 359395
```
fed302ae

[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result... · bd35a309

Craig Topper authored Apr 28, 2019

[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead

Summary:
The register form of these instructions are CodeGenOnly instructions that cover
GR32->FR32 and GR64->FR64 bitcasts. There is a similar set of instructions for
the opposite bitcast. Due to the patterns using bitcasts these instructions get
marked as "bitcast" machine instructions as well. The peephole pass is able to
look through these as well as other copies to try to avoid register bank copies.

Because FR32/FR64/VR128 are all coalescable to each other we can end up in a
situation where a GR32->FR32->VR128->FR64->GR64 sequence can be reduced to
GR32->GR64 which the copyPhysReg code can't handle.

To prevent this, this patch removes one set of the 'bitcast' instructions. So
now we can only go GR32->VR128->FR32 or GR64->VR128->FR64. The instruction that
converts from GR32/GR64->VR128 has no special significance to the peephole pass
and won't be looked through.

I guess the other option would be to add support to copyPhysReg to just promote
the GR32->GR64 to a GR64->GR64 copy. The upper bits were basically undefined
anyway. But removing the CodeGenOnly instruction in favor of one that won't be
optimized seemed safer.

I deleted the peephole test because it couldn't be made to work with the bitcast
instructions removed.

The load version of the instructions were unnecessary as the pattern that selects
them contains a bitcasted load which should never happen.

Fixes PR41619.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61223

llvm-svn: 359392

bd35a309

Apr 27, 2019

Revert rL359389: [X86][SSE] Add support for <64 x i1> bool reduction · 03c4e266

Simon Pilgrim authored Apr 27, 2019

Minor generalization of the existing <32 x i1> pre-AVX2 split code.
........
Causing irregular buildbot failures.

llvm-svn: 359391

03c4e266

[X86][AVX] Add additional SSE/AVX expandload and compressstore targets · 1a4a4325
Simon Pilgrim authored Apr 27, 2019
```
llvm-svn: 359390
```
1a4a4325
[X86][SSE] Add support for <64 x i1> bool reduction · 4118be3a
Simon Pilgrim authored Apr 27, 2019
```
Minor generalization of the existing <32 x i1> pre-AVX2 split code.

llvm-svn: 359389
```
4118be3a
[X86][AVX] Cleanup and add additional expandload and compressstore tests · 399746ea
Simon Pilgrim authored Apr 27, 2019
```
sort order by types and add vXi32/vXi16/vXi8 test coverage

llvm-svn: 359388
```
399746ea

[X86][AVX512] Improve vector bool reductions · 2a2d4224

Simon Pilgrim authored Apr 27, 2019

As predicate masks are legal on AVX512 targets, we avoid MOVMSK in these cases, but we can just bitcast the bool vector to the integer equivalent directly - avoiding expansion of the reduction to a shuffle pattern.

llvm-svn: 359386

2a2d4224

[X86] Add vector boolean reduction tests (PR38840) · 913bfd33

Simon Pilgrim authored Apr 27, 2019

AND/OR/XOR tests for the @llvm.experimental.vector.reduce intrinsics

AND/OR are pretty good (pre-AVX512), XOR (not so common but used for parity reduction) is still pretty bad.

llvm-svn: 359385

913bfd33

Fix check-prefixes typo · 5cf61653
Simon Pilgrim authored Apr 27, 2019
```
llvm-svn: 359382
```
5cf61653
[X86][SSE] Add initial test case for subvector insert/extract of illegal types · 3879b2cd
Simon Pilgrim authored Apr 27, 2019
```
Suggested by @nikic on D59188

llvm-svn: 359379
```
3879b2cd

[X86][AVX] Merge mask select with shuffles across extract_subvector (PR40332) · acc1e6d1

Simon Pilgrim authored Apr 27, 2019

Fixes PR40332 in the limited case where we're selecting between a target shuffle and a zero vector.

We can extend this in the future to handle more opcodes and non-zero selections.

llvm-svn: 359378

acc1e6d1

[X86] Use MOVQ for i64 atomic_stores when SSE2 is enabled · 063b471f

Craig Topper authored Apr 27, 2019

Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store.

Reviewers: spatel, RKSimon, reames, jfb, efriedma

Reviewed By: RKSimon

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60546

llvm-svn: 359368

063b471f

Revert "AMDGPU: Split block for si_end_cf" · 76c5b629

Mark Searles authored Apr 27, 2019

This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4.

We discovered some internal test failures, so reverting for now.

Differential Revision: https://reviews.llvm.org/D61213

llvm-svn: 359363

76c5b629

Apr 26, 2019

[GlobalISel][AArch64] Use getConstantVRegValWithLookThrough for extracts · 76f64b66

Jessica Paquette authored Apr 26, 2019

getConstantVRegValWithLookThrough does the same thing as the
getConstantValueForReg function, and has more visibility across GISel. Plus, it
supports looking through G_TRUNC, G_SEXT, and G_ZEXT. So, we get better code
reuse and more functionality for free by using it.

Add some test cases to select-extract-vector-elt.mir to show that we can now
look through those instructions.

llvm-svn: 359351

76f64b66

[AsmPrinter] refactor to support %c w/ GlobalAddress' · 7ab164c4

Nick Desaulniers authored Apr 26, 2019

Summary:
Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when
printing the address of a MachineOperand::MO_GlobalAddress. Move that
handling into a new overriden method in each base class. A virtual
method was added to the base class for handling the generic case.

Refactors a few subclasses to support the target independent %a, %c, and
%n.

The patch also contains small cleanups for AVRAsmPrinter and
SystemZAsmPrinter.

It seems that NVPTXTargetLowering is possibly missing some logic to
transform GlobalAddressSDNodes for
TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended
inline assembly asm constraints.

Fixes:
- https://bugs.llvm.org/show_bug.cgi?id=41402
- https://github.com/ClangBuiltLinux/linux/issues/449

Reviewers: echristo, void

Reviewed By: void

Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60887

llvm-svn: 359337

7ab164c4

[X86][AVX] Fold extract_subvector(broadcast(x)) -> broadcast(x) iff x has one use · 27e01e67
Simon Pilgrim authored Apr 26, 2019
```
llvm-svn: 359332
```
27e01e67

[AArch64][GlobalISel] Select G_BSWAP for vectors of s32 and s64 · 67ab9eb1

Jessica Paquette authored Apr 26, 2019

There are instructions for these, so mark them as legal. Select the correct
instruction in AArch64InstructionSelector.cpp.

Update select-bswap.mir and arm64-rev.ll to reflect the changes.

llvm-svn: 359331

67ab9eb1

[AMDGPU] gfx1010 VOP3 and VOP3P implementation · 61beff02
Stanislav Mekhanoshin authored Apr 26, 2019
```
Differential Revision: https://reviews.llvm.org/D61202

llvm-svn: 359328
```
61beff02
[AMDGPU] gfx1010 VOP2 changes · 8f3da70e
Stanislav Mekhanoshin authored Apr 26, 2019
```
Differential Revision: https://reviews.llvm.org/D61156

llvm-svn: 359316
```
8f3da70e
[x86] add tests for fmin/fmax; NFC · 8224bc08
Sanjay Patel authored Apr 26, 2019
```
'maximum' and 'minimum' still crash, so they are commented out.

llvm-svn: 359306
```
8224bc08

[X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 targets (PR40758) · 5d6ef94c

Simon Pilgrim authored Apr 26, 2019

As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs).

Differential Revision: https://reviews.llvm.org/D61068

llvm-svn: 359293

5d6ef94c

[X86][AVX] Combine shuffles extracted from a common vector · 5e161df9

Simon Pilgrim authored Apr 26, 2019

A small step towards combining shuffles across vector sizes - this recognizes when a shuffle's operands are all extracted from the same larger source and tries to combine to an unary shuffle of that source instead. Fixes one of the test cases from PR34380.

Differential Revision: https://reviews.llvm.org/D60512

llvm-svn: 359292

5e161df9

Fix alignment in AArch64InstructionSelector::emitConstantPoolEntry() · 5d5ee4af

Hans Wennborg authored Apr 26, 2019

The code was using the alignment of a pointer to the value, not the
alignment of the constant itself.

Maybe we got away with it so far because the pointer alignment is
fairly high, but we did end up under-aligning <16 x i8> vectors,
which was caught in the Chromium build after lld stopped over-aligning
the .rodata.cst16 section in r356428. (See crbug.com/953815)

Differential revision: https://reviews.llvm.org/D61124

llvm-svn: 359287

5d5ee4af

PTX 6.3 extends `wmma` instruction to support s8/u8/s4/u4/b1 -> s32. · 16737538

Artem Belevich authored Apr 25, 2019

All of the new instructions are still handled mostly by tablegen. I've slightly
refactored the code to drive intrinsic/instruction generation from a master
list of supported variants, so all irregularities have to be implemented in one place only.

The test generation script wmma.py has been refactored in a similar way.

Differential Revision: https://reviews.llvm.org/D60015

llvm-svn: 359247

16737538

[NVPTX] generate correct MMA instruction mnemonics with PTX63+. · 8d825b38

Artem Belevich authored Apr 25, 2019

PTX 6.3 requires using ".aligned" in the MMA instruction names.
In order to generate correct name, now we pass current
PTX version to each instruction as an extra constant operand
and InstPrinter adjusts its output accordingly.

Differential Revision: https://reviews.llvm.org/D59393

llvm-svn: 359246

8d825b38