Commits · e41e3d323769a1de9980a78b507c6b9f5d4b6946 · Lorenzo Albano / LLVM bpEVL

May 08, 2018

[Power9]Legalize and emit code for truncate and convert QP to HW and Byte · e41e3d32

Lei Huang authored May 08, 2018

Legalize and emit code for truncate and convert float128 to (un)signed short
and (un)signed char.

Differential Revision: https://reviews.llvm.org/D46194

llvm-svn: 331797

e41e3d32

AMDGPU: Fix broken dynamic vector indexing for packed types · 869cbedc
Matt Arsenault authored May 08, 2018
```
The intention of this was to multiply by 16, not shift by 16.

llvm-svn: 331793
```
869cbedc

[Power9]Legalize and emit code for truncate and convert Quad-Precision to Word · 6364288d

Lei Huang authored May 08, 2018

Legalize and emit code for:

  * xscvqpswz : VSX Scalar truncate & Convert Quad-Precision to Signed Word
  * xscvqpuwz : VSX Scalar truncate & Convert Quad-Precision to Unsigned Word

Differential Revision: https://reviews.llvm.org/D45635

llvm-svn: 331790

6364288d

AMDGPU: Use eraseFromParent to delete am instruction when it is no longer needed. · d049da37
Changpeng Fang authored May 08, 2018
```
Reviewer: Nicolai

Differential Revision:
  https://reviews.llvm.org/D46438

llvm-svn: 331788
```
d049da37

[Power9]Legalize and emit code for truncate and convert QP to DW · c517e95b

Lei Huang authored May 08, 2018

Legalize and emit code for:

  * xscvqpsdz : VSX Scalar truncate & Convert Quad-Precision to Signed Dword
  * xscvqpudz : VSX Scalar truncate & Convert Quad-Precision to Unsigned Dword

Differential Revision: https://reviews.llvm.org/D45553

llvm-svn: 331787

c517e95b

[PowerPC] Unify handling for conversion of FP_TO_INT feeding a store · c29229a6

Lei Huang authored May 08, 2018

Existing DAG combine only handles conversions for FP_TO_SINT:
"{f32, f64} x { i32, i16 }"

This patch simplifies the code to handle:
"{ FP_TO_SINT, FP_TO_UINT } x { f64, f32 } x { i64, i32, i16, i8 }"

Differential Revision: https://reviews.llvm.org/D46102

llvm-svn: 331778

c29229a6

[AMDGPU] Added checks for dpp_ctrl value · 43293616

Stanislav Mekhanoshin authored May 08, 2018

- Report error for invalid dpp_ctrl values.
- Changed the way it is reported, now the error will be emitted into
  asm and will work with release build as well.
- Added dpp_ctrl value verifier for codegen.
- Added symbolic constants for dpp_ctrl.

Differential Revision: https://reviews.llvm.org/D46565

llvm-svn: 331775

43293616

[X86] Tag PCONFIG instruction with WriteSystem scheduler class · f5f28aa7
Simon Pilgrim authored May 08, 2018
```
llvm-svn: 331773
```
f5f28aa7

[mips][msa] Pattern match the splat.d instruction · c7113cc9

Stefan Maksimovic authored May 08, 2018

Introduced a new pattern for matching splat.d explicitly.

Both splat.d and splati.d can now be generated from the @llvm.mips.splat.d
intrinsic depending on whether an immediate value has been passed.

Differential Revision: https://reviews.llvm.org/D45683

llvm-svn: 331771

c7113cc9

[X86] Split off WriteIMul64 from WriteIMul schedule class (PR36931) · 2864b464

Simon Pilgrim authored May 08, 2018

This fixes a couple of BtVer2 missing instructions that weren't been handled in the override.

NOTE: There are still a lot of overrides that still need cleaning up!
llvm-svn: 331770

2864b464

[X86] Split WriteIDiv into div/idiv 8/16/32/64 implementations (PR36930) · 25805543

Simon Pilgrim authored May 08, 2018

I've created the necessary classes but there are still a lot of overrides that need cleaning up.

NOTE: The Znver1 model was missing some div/idiv variants in the instregex patterns and wasn't setting the resource cycles at all in the overrides.
llvm-svn: 331767

25805543

[X86] Add vector masked load/store scheduler classes (PR32857) · b0a3be04
Simon Pilgrim authored May 08, 2018
```
Split off from existing vector load/store classes to remove InstRW overrides.

llvm-svn: 331760
```
b0a3be04

[AArch64][SVE] Asm: Support for LD1R load-and-replicate scalar instructions. · d8e76494

Sander de Smalen authored May 08, 2018

Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46251

llvm-svn: 331758

d8e76494

[X86] Add SchedWriteFTest/SchedWriteVecTest TEST scheduler classes · 210286ed
Simon Pilgrim authored May 08, 2018
```
Split off from SchedWriteVecLogic to remove InstRW overrides.

llvm-svn: 331757
```
210286ed

[mips] Mark various memory instructions as being in microMIPS (NFC) · e0982cca

Simon Dardis authored May 08, 2018

Reviewers: atanasyan, abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D46388

llvm-svn: 331756

e0982cca

[AArch64] Disallow vector operand if FPR128 Q register is required. · 20eede70

Sander de Smalen authored May 08, 2018

Patch https://reviews.llvm.org/D41445 changed the behaviour of 'isReg()'
to also return 'true' if the parsed register operand is a vector
register. Code in the AsmMatcher checks if a register is a subclass of the
expected register class. However, even though both parsed registers map
to the same physical register, the 'v' register is of kind 'NeonVector',
where 'q' is of type Scalar, where isSubclass() does not distinguish
between the two cases.

The solution is to use an AsmOperand instead of the register directly,
and use the PredicateMethod to distinguish the two operands.

This fixes for example:
  ldr v0, [x0]    // 'v0' is an invalid operand for this instruction
  ldr q0, [x0]    // valid

Reviewers: aemerson, Gerolf, SjoerdMeijer, javed.absar

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D46310

llvm-svn: 331755

20eede70

[mips] Correct clo/clz predicates · 7563624f

Simon Dardis authored May 08, 2018

Reviewers: smaksimovic, abeserminji, atanasyan

Differential Revision: https://reviews.llvm.org/D46125

llvm-svn: 331754

7563624f

[X86] Mark all byval parameters as aliased · 4f799c02

Jeremy Morse authored May 08, 2018

This is a fix for PR30290: by marking all byval stack slots as being aliased,
the instruction scheduler is more conservative about rescheduling memory
accesses to such stack slots as an LLVM Value* might alias it. This fixes
errors such as in the patched test case, where reads and writes to a data
structure are illegally mixed.

This could be fixed better in the future with better analysis for the
instruction scheduler to know what Values alias what stack slots.

Differential Revision: https://reviews.llvm.org/D45022

llvm-svn: 331749

4f799c02

[X86][CET] Shadow stack fix for setjmp/longjmp · c47f7992

Alexander Ivchenko authored May 08, 2018

This patch adds a shadow stack fix when compiling
setjmp/longjmp with the shadow stack enabled. This
allows setjmp/longjmp to work correctly with CET.

Patch by mike.dvoretsky

Differential Revision: https://reviews.llvm.org/D46181

llvm-svn: 331748

c47f7992

[x86] Introduce the enclv instruction · 4a02bf94

Gabor Buella authored May 08, 2018

Summary:
and use the -msgx flag as a requirement
for the SGX instructions.

Reviewers: craig.topper, zvi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D46436

llvm-svn: 331742

4a02bf94

[x86] Introduce the pconfig instruction · 2b5e9600

Gabor Buella authored May 08, 2018

Reviewers: craig.topper, zvi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D46430

llvm-svn: 331739

2b5e9600

AMDGPU/GlobalISel: Don't try to lower hull shaders · 37444285

Tom Stellard authored May 07, 2018

Summary: The AMDGPU_HS calling convention is not supported yet.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46149

llvm-svn: 331691

37444285

May 07, 2018

[DagCombiner] Not all 'andn''s work with immediates. · cc42d08b

Roman Lebedev authored May 07, 2018

Summary:
Split off from D46031.

In masked merge case, this degrades IPC by decreasing instruction count.
{F6108777}
The next patch should be able to recover and improve this.

This also affects the transform @spatel have added in D27489 / rL289738,
and the test coverage for X86 was missing.
But after i have added it, and looked at the changes in MCA, i'm somewhat confused.
{F6093591} {F6093592} {F6093593}
I'd say this regression is an improvement, since `IPC` increased in that case?

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: andreadb, llvm-commits, spatel

Differential Revision: https://reviews.llvm.org/D46493

llvm-svn: 331684

cc42d08b

[X86] Split WriteFAdd/WriteFCmp/WriteFMul schedule classes · 1233e123

Simon Pilgrim authored May 07, 2018

Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions.

Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour.

llvm-svn: 331672

1233e123

[X86][AVX2] Tag VPMOVSX/VPMOVZX ymm instructions as WriteShuffle256 · e480ed0b

Simon Pilgrim authored May 07, 2018

These are more like cross-lane shuffles than regular shuffles - we already do this for AVX512 equivalents.

Differential Revision: https://reviews.llvm.org/D46229

llvm-svn: 331659

e480ed0b

[Hexagon] Move clamping of extended operands directly to MC code emitter · 786fc3d0
Krzysztof Parzyszek authored May 07, 2018
```
llvm-svn: 331653
```
786fc3d0

[X86][Znver1] Remove WriteFMul/WriteFRcp InstRW overrides/aliases. · 763bf120

Simon Pilgrim authored May 07, 2018

Fixes x87 schedules to more closely match Agner - AMD doesn't tend to "special case" x87 instructions as much as Intel.

llvm-svn: 331645

763bf120

[X86] Split WriteFDiv schedule classes to support single/double scalar, XMM... · ac5d0a31

Simon Pilgrim authored May 07, 2018

[X86] Split WriteFDiv schedule classes to support single/double scalar, XMM and YMM/ZMM instructions.

This removes all InstrRW overrides for these instructions - some x87 overrides remain but most use default (and realistic) values.

llvm-svn: 331643

ac5d0a31

[AMDGPU][Waitcnt] Remove the old waitcnt pass · 4a0f2c50

Mark Searles authored May 07, 2018

Remove the old waitcnt pass ( si-insert-waits ), which is no longer maintained
and getting crufty

Differential Revision: https://reviews.llvm.org/D46448

llvm-svn: 331641

4a0f2c50

[AMDGPU] Don't force WQM for DS op · 18a1e9d0

Tim Renouf authored May 07, 2018

Summary:
Previously, all DS ops forced WQM in a pixel shader. That was a hack to
allow for graphics frontends using ds_swizzle to implement explicit
derivatives, on SI/CI at least where DPP is not available. But it forced
WQM for _any_ DS op.

With this commit, DS ops no longer force WQM. Both graphics frontends
(Mesa and LLPC) need to change to issue an explicit llvm.amdgcn.wqm
intrinsic call when calculating explicit derivatives.

The required Mesa change is: "amd/common: use llvm.amdgcn.wqm for
explicit derivatives".

Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46051

Change-Id: I9b745b626fa91bbd66456e6cf41ee07eeea42f81
llvm-svn: 331633

18a1e9d0

[X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt schedule classes · f3ae50fc

Simon Pilgrim authored May 07, 2018

WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions.

WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions.

This removes all InstrRW overrides for these instructions.

NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner.

NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80.
llvm-svn: 331629

f3ae50fc

[SystemZ] Bugfix for MVCLoop CC clobbering. · ebb1605b

Jonas Paulsson authored May 07, 2018

MVCLoop clobbers CC (since it emits a compare/branch), but this was not
modelled.

Review: Ulrich Weigand
llvm-svn: 331627

ebb1605b

[ARM] Select result 1 from ConvertBooleanCarryToCarryFlag's result automatically. NFC · f91b6a8c
Amaury Sechet authored May 07, 2018
```
The old behavior return the value 0, which is error prone.

llvm-svn: 331614
```
f91b6a8c
[X86] Fix copy/paste mistake in comment. NFC · c882014f
Craig Topper authored May 07, 2018
```
llvm-svn: 331611
```
c882014f

May 06, 2018

[X86] Enable reciprocal estimates for v16f32 vectors by using VRCP14PS/VRSQRT14PS · cb2abc79

Craig Topper authored May 06, 2018

Summary:
The legacy VRCPPS/VRSQRTPS instructions aren't available in 512-bit versions. The new increased precision versions are. So we can use those to implement v16f32 reciprocal estimates.

For KNL CPUs we can probably use VRCP28PS/VRSQRT28PS and avoid the NR step altogether, but I leave that for a future patch.

Reviewers: spatel

Reviewed By: spatel

Subscribers: RKSimon, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D46498

llvm-svn: 331606

cb2abc79

May 05, 2018

[globalisel] Update GlobalISel emitter to match new representation of extending loads · f84bc379

Daniel Sanders authored May 05, 2018

Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
  registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
  extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
  improve optimization of extends and truncates, this legality requirement
  would spread without considerable care w.r.t when certain combines were
  permitted.
* The SelectionDAG importer required some ugly and fragile pattern
  rewriting to translate patterns into this style.

This patch changes the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.

Each extending load can be lowered by the legalizer into separate extends
and loads, however a target that supports s1 will need the any-extending
load to extend to at least s8 since LLVM does not represent memory accesses
smaller than 8 bit. The legalizer can widenScalar G_LOAD into an
any-extending load but sign/zero-extending loads need help from something
else like a combiner pass. A follow-up patch that adds combiner helpers for
for this will follow.

The new representation requires that the MMO correctly reflect the memory
access so this has been corrected in a couple tests. I've also moved the
extending loads to their own tests since they are (mostly) separate opcodes
now. Additionally, the re-write appears to have invalidated two tests from
select-with-no-legality-check.mir since the matcher table no longer contains
loads that result in s1's and they aren't legal in AArch64 anymore.

Depends on D45540

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar

Reviewed By: rtereshin

Subscribers: javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45541

llvm-svn: 331601

f84bc379

Simplify LLVM_ATTRIBUTE_USED call sites. · 862eebb6
Fangrui Song authored May 05, 2018
```
llvm-svn: 331599
```
862eebb6

Fix a bunch of places where operator-> was used directly on the return from dyn_cast. · 781aa181

Craig Topper authored May 05, 2018

Inspired by r331508, I did a grep and found these.

Mostly just change from dyn_cast to cast. Some cases also showed a dyn_cast result being converted to bool, so those I changed to isa.

llvm-svn: 331577

781aa181

AMDGPU/NFC: Update D16PreservesUnusedBits description based Tony Tye's comments · 91a74f53
Konstantin Zhuravlyov authored May 04, 2018
```
llvm-svn: 331564
```
91a74f53

May 04, 2018
- AMDGPU/NFC: Fix formatting for 900, 902 ISA Version features · 3fc4067a
  Konstantin Zhuravlyov authored May 04, 2018
```
llvm-svn: 331553
```
  3fc4067a