Commits · 1a8eec141ac2fa79d1a3ea829afa0d9c30939cd0 · Lorenzo Albano / LLVM bpEVL

Jun 12, 2017

[PowerPC] Match vec_revb builtins to P9 instructions. · 1a8eec14

Tony Jiang authored Jun 12, 2017

Power9 has instructions that will reverse the bytes within an element for all
sizes (half-word, word, double-word and quad-word). These can be used for the
vec_revb builtins in altivec.h. However, we implement these to match vector
shuffle nodes as that will cover both the builtins and vector shuffles that
occur in the SDAG through other means.

Differential Revision: https://reviews.llvm.org/D33690

llvm-svn: 305214

1a8eec14

[Power9] Added support for the modsw, moduw, modsd, modud hardware instructions. · 30a49d1a

Tony Jiang authored Jun 12, 2017

Note that if we need the result of both the divide and the modulo then we
compute the modulo based on the result of the divide and not using the new
hardware instruction.

Commit on behalf of STEFAN PINTILIE.
Differential Revision: https://reviews.llvm.org/D33940

llvm-svn: 305210

30a49d1a

AMDGPU: Don't add same implicit use multiple times · 05c26472

Matt Arsenault authored Jun 12, 2017

For the last component, the same register use
was added as an implicit use and another implicit kill use.

llvm-svn: 305205

05c26472

AMDGPU: Teach isLegalAddressingMode about flat offsets · d9b77848
Matt Arsenault authored Jun 12, 2017
```
Also fix reporting r+r as a valid addressing mode without
offsets.

llvm-svn: 305203
```
d9b77848
AMDGPU: Start selecting flat instruction offsets · db7c6a87
Matt Arsenault authored Jun 12, 2017
```
llvm-svn: 305201
```
db7c6a87

AMDGPU: Verify that flat offsets aren't used pre-GFX9 · 89ad17ce

Matt Arsenault authored Jun 12, 2017

For convenience the operand is always present in the instruction,
but it isn't valid to use except on GFX9.

llvm-svn: 305200

89ad17ce

[Falkor] Enable SW Prefetch. · ef790ffd

Haicheng Wu authored Jun 12, 2017

SW prefetch is good for Falkor.

Differential Revision: http://reviews.llvm.org/D34084

llvm-svn: 305199

ef790ffd

AMDGPU: Start adding offset fields to flat instructions · fd023141
Matt Arsenault authored Jun 12, 2017
```
llvm-svn: 305194
```
fd023141

[DAG] add helper to bind memop chains; NFCI · d4765a38

Sanjay Patel authored Jun 12, 2017

This step is just intended to reduce code duplication rather than change any functionality.

A follow-up would be to replace PPCTargetLowering::spliceIntoChain() usage with this new helper.

Differential Revision: https://reviews.llvm.org/D33649

llvm-svn: 305192

d4765a38

Const correctness for TTI::getRegisterBitWidth · c0112ae8

Daniel Neilson authored Jun 12, 2017

Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation.

Reviewers: chandlerc, rnk, reames

Reviewed By: reames

Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D33903

llvm-svn: 305189

c0112ae8

[X86][SSE] Change memop fragment to inherit from vec128load with local alignment controls · b079c8b3

Simon Pilgrim authored Jun 12, 2017

First possible step towards merging SSE/AVX memory folding pattern fragments.

Also allows us to remove the duplicate non-temporal load logic.

Differential Revision: https://reviews.llvm.org/D33902

llvm-svn: 305184

b079c8b3

[AVX-512] Add VPCONFLICT and VPLZCNT to load folding tables. · 69fead95
Craig Topper authored Jun 12, 2017
```
llvm-svn: 305180
```
69fead95

Jun 11, 2017

[x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a 32-byte load · dcbfbb11

Sanjay Patel authored Jun 11, 2017

I was looking closer at the x86 test diffs in D33866, and the first change seems like it
shouldn't happen in the first place. So this patch will resolve that.

Using Agner's tables and AMD docs, vperm2f128 and vinsertf128 have identical timing for
any given CPU model, so we should be able to interchange those without affecting perf.
But as we can see in some of the diffs here, using vperm2f128 allows load folding, so
we should take that opportunity to reduce code size and register pressure.

A secondary advantage is making AVX1 and AVX2 codegen more similar. Given that vperm2f128
was introduced with AVX1, we should be selecting it in all of the same situations that we
would with AVX2. If there's some reason that an AVX1 CPU would not want to use this
instruction, that should be fixed up in a later pass.

Differential Revision: https://reviews.llvm.org/D33938

llvm-svn: 305171

dcbfbb11

Jun 10, 2017

AMDGPU : Fix ISA Version Definitions. · 7c3e5115
Wei Ding authored Jun 10, 2017
```
Differential Revision: http://reviews.llvm.org/D28531

llvm-svn: 305137
```
7c3e5115

[AArch64] Add fallback in FastISel fp16 conversions · 21fde385

I-Jui (Ray) Sung authored Jun 09, 2017

Summary:
- Fix assertion failures on F16 to/from int types in FastISel by falling
  back to regular ISel
- Add a testcase of various conversion cases with FastISel (-O0)

Reviewers: kristof.beyls, jmolloy, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: SjoerdMeijer, llvm-commits, srhines, pirama, aemerson, rengolin, javed.absar, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33734

llvm-svn: 305127

21fde385

Jun 09, 2017

[AMDGPU] Add intrinsics for alignbit and alignbyte instructions · 1a61ab81
Stanislav Mekhanoshin authored Jun 09, 2017
```
Differential Revision: https://reviews.llvm.org/D34046

llvm-svn: 305098
```
1a61ab81
[X86][SSE] Add support for PACKSS nodes to faux shuffle extraction · 3d37b1a2
Simon Pilgrim authored Jun 09, 2017
```
If the inputs won't saturate during packing then we can treat the PACKSS as a truncation shuffle

llvm-svn: 305091
```
3d37b1a2

[Hexagon] Fixes and updates to the selection patterns · 7aca2fd8

Krzysztof Parzyszek authored Jun 09, 2017

- Add some missing patterns.
- Use C4_cmplte in branch patterns.
- Fix signedness of immediate operand in M2_accii.

llvm-svn: 305085

7aca2fd8

Reland "[SelectionDAG] Enable target specific vector scalarization of calls and returns" · 212cccb2

Simon Dardis authored Jun 09, 2017

By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown,
backends can request that LLVM to scalarize vector types for calls
and returns.

The MIPS vector ABI requires that vector arguments and returns are passed in
integer registers. With SelectionDAG's new hooks, the MIPS backend can now
handle LLVM-IR with vector types in calls and returns. E.g.
'call @foo(<4 x i32> %4)'.

Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for
calls and returns if vector types were not legal. If vector types were legal,
a single 128bit vector argument would be assigned to a single 32 bit / 64 bit
integer register.

By teaching the MIPS backend to inspect the original types, it can now
implement the MIPS vector ABI which requires a particular method of
scalarizing vectors.

Previously, the MIPS backend relied on clang to scalarize types such as "call
@foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3,
i32 inreg %4)".

This patch enables the MIPS backend to take either form for vector types.

The previous version of this patch had a "conditional move or jump depends on
uninitialized value".

Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur

Differential Revision: https://reviews.llvm.org/D27845

llvm-svn: 305083

212cccb2

[AMDGPU] Fix for issue in alloca to vector promotion pass · 82618baa

David Stuttard authored Jun 09, 2017

Summary:
Alloca promotion pass not dealing with non-canonical input

Added some additional checks so the pass simply backs-off forms it can't deal with (non-canonical)

Also added some test cases in non-canonical form to check that it no longer crashes

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31710

llvm-svn: 305079

82618baa

[ARM] Custom machine-scheduler. NFCI. · 9e1ff865

Javed Absar authored Jun 09, 2017

This patch creates a customised machine-scheduler for ARM targets,
so that subsequently DAG mutations etc can be added.
Reviewed by: hahn, rengolin, rovka. 
Differential Revision: https://reviews.llvm.org/D34039

llvm-svn: 305078

9e1ff865

[Hexagon] Add LLVM header to HexagonPatterns.td · 78814155
Krzysztof Parzyszek authored Jun 09, 2017
```
llvm-svn: 305074
```
78814155

[ARM] Add scheduling info for VFMS · ad097355

Oliver Stannard authored Jun 09, 2017

The scalar VFMS instructions did not have scheduling information attached (but
VFMA did), which was causing assertion failures with the Cortex-A57 scheduling
model and -fp-contract=fast.

Differential Revision: https://reviews.llvm.org/D34040

llvm-svn: 305064

ad097355

Test commit: remove whitespace · add20f8f
Stefan Maksimovic authored Jun 09, 2017
```
llvm-svn: 305059
```
add20f8f
Fix -Wunused-variable. · 365d4d00
Rui Ueyama authored Jun 09, 2017
```
llvm-svn: 305051
```
365d4d00

Jun 08, 2017

[Hexagon] Re-enable machine verifier after codegen passes · b1ada4e7
Krzysztof Parzyszek authored Jun 08, 2017
```
Remove "false" from the arguments to "addPass" in Hexagon's target pass
config.

llvm-svn: 305015
```
b1ada4e7
[Hexagon] Skip mux generation when predicate register is undefined · 8a7fb0fe
Krzysztof Parzyszek authored Jun 08, 2017
```
llvm-svn: 305014
```
8a7fb0fe

AMDGPU: Work around build special casing .inc files · f1202e65

Matt Arsenault authored Jun 08, 2017

It complains because it assumes these were autogenerated files
in the source directory.

llvm-svn: 305005

f1202e65

AMDGPU: Use correct register names in inline assembly · 3c7581bb
Matt Arsenault authored Jun 08, 2017
```
Fixes using physical registers in inline asm from clang.

llvm-svn: 305004
```
3c7581bb
[Hexagon] Speedup NumNodesBlocking calculation. NFCI. · 6a38cc6d
Nirav Dave authored Jun 08, 2017
```
llvm-svn: 305003
```
6a38cc6d

[PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64 · f31c56df

Guozhi Wei authored Jun 08, 2017

In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64.

This patch fixed PR32442.

Differential Revision: https://reviews.llvm.org/D31407

llvm-svn: 305001

f31c56df

[AMDGPU] Force qsads instrs to use different dest register than source registers · e5c78323

Mark Searles authored Jun 08, 2017

The V_MQSAD_PK_U16_U8, V_QSAD_PK_U16_U8, and V_MQSAD_U32_U8 take more than 1 pass in hardware. For these three instructions, the destination registers must be different than all sources, so that the first pass does not overwrite sources for the following passes.

Differential Revision: https://reviews.llvm.org/D33783

llvm-svn: 304998

e5c78323

[Power9] Exploit vector integer extend instructions · 79acbbe5

Zaara Syeda authored Jun 08, 2017

This patch adds build vector patterns to exploit the vector integer
extend instructions:
vextsb2w - Vector Extend Sign Byte To Word
vextsb2d - Vector Extend Sign Byte To Doubleword
vextsh2w - Vector Extend Sign Halfword To Word
vextsh2d - Vector Extend Sign Halfword To Doubleword
vextsw2d - Vector Extend Sign Word To Doubleword

Differential Revision: https://reviews.llvm.org/D33510

llvm-svn: 304992

79acbbe5

Add scheduler classes to integer/float horizontal operations. · 8cb1d093
Andrew V. Tischenko authored Jun 08, 2017
```
This patch will close PR32801.
Differential Revision: https://reviews.llvm.org/D33203

llvm-svn: 304986
```
8cb1d093
This patch closes PR28513: an optimization of multiplication by different constants. · e0531025
Andrew V. Tischenko authored Jun 08, 2017
```
The initial patch was rejected: I fixed the issue and re-apply it.

llvm-svn: 304972
```
e0531025

Jun 07, 2017

[Hexagon] Generate 'inbounds' GEPs in HexagonCommonGEP · 5ba13825
Krzysztof Parzyszek authored Jun 07, 2017
```
llvm-svn: 304937
```
5ba13825

[AMDGPU][MC] Corrected error message for s_waitcnt helpers · 5a2f881b

Dmitry Preobrazhensky authored Jun 07, 2017

See Bug 32711: https://bugs.llvm.org//show_bug.cgi?id=32711

Reviewers: artem.tamazov

Differential Revision: https://reviews.llvm.org/D33781

llvm-svn: 304922

5a2f881b

[mips][dsp] Modify repl.ph to accept signed immediate values · 2f5f8e94

Petar Jovanovic authored Jun 07, 2017

Changed immediate type for repl.ph from uimm10 to simm10 as per the specs.
Repl.qb still accepts uimm8. Both instructions now mimic the behaviour of
GNU as.

Patch by Stefan Maksimovic.

Differential Revision: https://reviews.llvm.org/D33594

llvm-svn: 304918

2f5f8e94

[SystemZ] Propagate MachineMemOperands · ae8d22ce
Jonas Paulsson authored Jun 07, 2017
```
In emitCondStore() and emitMemMemWrapper().

Review: Ulrich Weigand
llvm-svn: 304913
```
ae8d22ce

AMDGPU/GlobalISel: Mark 32-bit G_SELECT as legal · 2860a428

Tom Stellard authored Jun 07, 2017

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33949

llvm-svn: 304910

2860a428