Commits · 74467814f27e3da681a4faaf3edbaeb493a036b6 · Roger Ferrer / llvm-epi

Jun 03, 2019

[SystemZ] Remove sitofp(undef) from reduced test case. · 74467814
Simon Pilgrim authored Jun 03, 2019
```
Pre-commit for D62807 - which adds DAG [us]itofp(undef) --> 0 constant fold

llvm-svn: 362396
```
74467814

[ARM][FIX] Ran out of registers due tail recursion · df92f841

Diogo N. Sampaio authored Jun 03, 2019

Summary:
- pr42062
When compiling for MinSize,
ARMTargetLowering::LowerCall decides to indirect
multiple calls to a same function. However,
it disconsiders the limitation that thumb1
indirect calls require the callee to be in a
register from r0 to r3 (llvm limiation).
If all those registers are used by arguments, the
compiler dies with "error: run out of registers
during register allocation".
This patch tells the function
IsEligibleForTailCallOptimization if we intend to
perform indirect calls, as to avoid tail call
optimization.

Reviewers: dmgreen, efriedma

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62683

llvm-svn: 362366

df92f841

[AArch64] Check for simple type in FPToUInt · a0bd6f8a

Sam Parker authored Jun 03, 2019

DAGCombiner was hitting a SimpleType assertion when trying to combine
a v3f32 before type legalization.

bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41916

Differential Revision: https://reviews.llvm.org/D62734

llvm-svn: 362365

a0bd6f8a

[NFC][X86] extract-{low,}bits.ll: one more pattern c with truncation · bcd54288
Roman Lebedev authored Jun 03, 2019
```
llvm-svn: 362364
```
bcd54288

[AVR] Fix incorrect source regclass of LDWRdPtr · 20b14dac

Jim Lin authored Jun 03, 2019

Summary:
LDWRdPtr would be expanded to ld+ldd. ldd only accepts the pointer register is Y or Z.
So the register class of pointer of LDWRdPtr should be PTRDISPREGS instead of PTRREGS.

Reviewers: dylanmckay

Reviewed By: dylanmckay

Subscribers: dylanmckay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62300

llvm-svn: 362351

20b14dac

Recommit r360171: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor. · e71963c8

Florian Hahn authored Jun 03, 2019

If we hit the limit, we do expand the outstanding tokenfactors.
Otherwise, we might drop nodes with users in the unexpanded
tokenfactors. This fixes the crashes reported by Jordan Rupprecht.

Reviewers: niravd, spatel, craig.topper, rupprecht

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D62633

llvm-svn: 362350

e71963c8

[DAGCombiner][X86] Fold away masked store and scatter with all zeroes mask. · 50b35caf
Craig Topper authored Jun 02, 2019
```
Similar to what was done for masked load and gather.

llvm-svn: 362342
```
50b35caf

[X86] Add test cases for masked store and masked scatter with an all zeroes... · 5f79d749

Craig Topper authored Jun 02, 2019

[X86] Add test cases for masked store and masked scatter with an all zeroes mask. Fix bug in ScalarizeMaskedMemIntrin

Need to cast only to Constant instead of ConstantVector to allow
ConstantAggregateZero.

llvm-svn: 362341

5f79d749

Jun 02, 2019

[DAGCombiner] Replace masked loads with a zero mask with the passthru value · a7bc31eb
Craig Topper authored Jun 02, 2019
```
Similar to what was recently done for gathers in r362015.

llvm-svn: 362337
```
a7bc31eb
[NFC][X86] extract-{low,}bits.ll: one more pattern a with truncation · 420f5df1
Roman Lebedev authored Jun 02, 2019
```
llvm-svn: 362330
```
420f5df1
[X86] isHorizontalBinOp - add extract_subvector(shuffle(x)) handling (PR39921) · 71a39bcf
Simon Pilgrim authored Jun 02, 2019
```
Let's us match horizontal op patterns on fast-variable-shuffle targets (Haswell etc.)

llvm-svn: 362327
```
71a39bcf

[X86] Add AVX2 'fast-variable-shuffle' PHADD tests (PR39921) · b0dc262f

Simon Pilgrim authored Jun 02, 2019

Haswell etc. will combine shuffles to a extract_subvector(permd(x)) before isHorizontalBinOp can match it.

llvm-svn: 362326

b0dc262f

[NFC][X86] extract-lowbits.ll: add one more pattern a with truncation · 2065ddfd
Roman Lebedev authored Jun 02, 2019
```
We are also free to interpret this as 'BZHI'/'BEXTR'.
https://rise4fun.com/Alive/dD6

llvm-svn: 362325
```
2065ddfd

[DAG] isBitwiseNot / isConstOrConstSplat - add support for build vector undefs... · ffb4d2bf

Simon Pilgrim authored Jun 02, 2019

[DAG] isBitwiseNot / isConstOrConstSplat - add support for build vector undefs + truncation (PR41020)

Add (opt-in) support for implicit truncation to isConstOrConstSplat, which allows us to match truncated 'all ones' cases in isBitwiseNot.

PR41020 compares against using ISD::isBuildVectorAllOnes() instead, but that predicate silently accepts any UNDEF elements in the build vector which might not be what we want in isBitwiseNot - so I've added an opt-in 'AllowUndefs' flag that is set to false by default but will allow us to enable it on individual cases where its safe.

Differential Revision: https://reviews.llvm.org/D62783

llvm-svn: 362323

ffb4d2bf

[NFC][X86] extract-lowbits.ll: add patterns with truncation too · 0bfa9359

Roman Lebedev authored Jun 02, 2019

If we look past truncations of X too eagerly (D62786), we may
end up with 64-bit 'BEXTR', even though 32-bit-one would suffice.

llvm-svn: 362319

0bfa9359

[X86] Simplify the CHECK lines in vector-reduce-and/or/xor-widen.ll in similar way to r362308. · fe699c32
Craig Topper authored Jun 02, 2019
```
Forgot to do the widen forms when I was doing the others.

llvm-svn: 362310
```
fe699c32
[X86] Add the SSE versions of PMULLW and PMULLD to isAssociativeAndCommutative. · 396a915c
Craig Topper authored Jun 02, 2019
```
llvm-svn: 362309
```
396a915c

[X86] Simplify the CHECK lines in vector-reduce-and/or/xor. · 4721fad9

Craig Topper authored Jun 02, 2019

The AVX512BW and AVX512VL checks were never used. And AVX512 is the same
as AVX on all tests that weren't already split for AVX1 and AVX2.

llvm-svn: 362308

4721fad9

[X86] Add avx512 command lines and test cases to machine-combiner.ll · eeaecc63
Craig Topper authored Jun 02, 2019
```
llvm-svn: 362307
```
eeaecc63

Jun 01, 2019

[AMDGPU] Regenerate SDIV tests for an upcoming patch · cd1878d0
Simon Pilgrim authored Jun 01, 2019
```
llvm-svn: 362303
```
cd1878d0
[X86][AVX] Add tests for CONCAT(MOVDDUP(x),MOVDDUP(y)) · 0d4a0405
Simon Pilgrim authored Jun 01, 2019
```
llvm-svn: 362300
```
0d4a0405

[mips] Extend range of register indexes accepted by cfcmsa/ctcmsa · 25694e00

Simon Atanasyan authored Jun 01, 2019

The `cfcmsa` and `ctcmsa` instructions accept index of MSA control
register. The MIPS64 SIMD Architecture define eight MSA control
registers. But register index for `cfcmsa` and `ctcmsa` instructions
might be any number in 0..31 range. If the index is greater then 7,
`cfcmsa` writes zero to the destination registers and `ctcmsa` does
nothing [1].

[1] MIPS Architecture for Programmers Volume IV-j:
    The MIPS64 SIMD Architecture Module
https://www.mips.com/?do-download=the-mips64-simd-architecture-module

Differential Revision: https://reviews.llvm.org/D62597

llvm-svn: 362299

25694e00

[AVR] Disable register coalescing to the PTRDISPREGS class · 45eb4c7e

Dylan McKay authored Jun 01, 2019

If we would allow register coalescing on PTRDISPREGS class then register
allocator can lock Z register to some virtual register. Larger instructions
requiring a memory acces then fail during the register allocation phase since
there is no available register to hold a pointer if Y register was already
taken for a stack frame. This patch prevents it by keeping Z register
spillable. It does it by not allowing coalescer to lock it.

Original discussion on https://github.com/avr-rust/rust/issues/128.

llvm-svn: 362298

45eb4c7e

[NFC][Codegen] shift-amount-mod.ll: drop innermost operation · 1aaa23c0

Roman Lebedev authored Jun 01, 2019

I have initially added it in for test to display both
whether the binop w/ constant is sinked or hoisted.
But as it can be seen from the 'sub (sub C, %x), %y'
test, that actually conceals the issues it is supposed to test.

At least two more patterns are unhandled:
* 'add (sub C, %x), %y' - D62266
* 'sub (sub C, %x), %y'

llvm-svn: 362295

1aaa23c0

[X86] Add AVX512BF16 and AVX512VP2INTERSECT instructions to the loading folding tables. · c288a19b
Craig Topper authored Jun 01, 2019
```
llvm-svn: 362288
```
c288a19b
AMDGPU: Fix not adding ImplicitBufferPtr as a live-in · 302eedcb
Matt Arsenault authored May 31, 2019
```
Fixes missing test from r293000.

llvm-svn: 362275
```
302eedcb

May 31, 2019

[MIR-Canon] Don't do vreg skip for independent instructions if there are none. · 3ea6b24f

Puyan Lotfi authored May 31, 2019

We don't want to create vregs if there is nothing to use them for. That causes
verifier errors.

Differential Revision: https://reviews.llvm.org/D62740

llvm-svn: 362247

3ea6b24f

Revert revert of r362112 with minor SystemZ test file corrections. · ac790072

Kevin P. Neal authored May 31, 2019

[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes

This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.

Submitted by:	Drew Wock <drew.wock@sas.com>
Reviewed by:	Cameron McInally, Kevin P. Neal
Approved by:	Cameron McInally
Differential Revision:	https://reviews.llvm.org/D62546

llvm-svn: 362241

ac790072

[PPC] Correctly adjust branch probability in PPCReduceCRLogicals · c3a24e93

Guozhi Wei authored May 31, 2019

In PPCReduceCRLogicals after splitting the original MBB into 2, the 2 impacted branches still use original branch probability. This is unreasonable. Suppose we have following code, and the probability of each successor is 50%.

condc = conda || condb
br condc, label %target, label %fallthrough

It can be transformed to following,

br conda, label %target, label %newbb
newbb:
br condb, label %target, label %fallthrough

Since each branch has a probability of 50% to each successor, the total probability to %fallthrough is 25% now, and the total probability to %target is 75%. This actually changed the original profiling data. A more reasonable probability can be set to 70% to the false side for each branch instruction, so the total probability to %fallthrough is close to 50%.

This patch assumes the branch target with two incoming edges have same edge frequency and computes new probability fore each target, and keep the total probability to original targets unchanged.

Differential Revision: https://reviews.llvm.org/D62430

llvm-svn: 362237

c3a24e93

[AMDGPU] Regenerate add/sub shrink constant tests for an upcoming patch · db6a1d4f
Simon Pilgrim authored May 31, 2019
```
llvm-svn: 362230
```
db6a1d4f
[AMDGPU] Regenerate CTLZ tests for an upcoming patch · 27d6ea96
Simon Pilgrim authored May 31, 2019
```
llvm-svn: 362229
```
27d6ea96

[MIPS GlobalISel] Add detailed tests for lower call · f317debd

Petar Avramovic authored May 31, 2019

Test different operand types of callee and their behavior whether
relocation model is pic or not.
Possible operand types are:
Register (function pointer),
External symbol (used for libcalls e.g. __udivdi3 or memcpy),
Global address.

Global address has different handling depending on relocation model
and linkage type. Register and external symbol do not.

Differential Revision: https://reviews.llvm.org/D62590

llvm-svn: 362212

f317debd

[MIPS GlobalISel] Handle position independent code · efcd3c00

Petar Avramovic authored May 31, 2019

Handle position independent code for MIPS32.
When callee is global address, lower call will emit callee
as G_GLOBAL_VALUE and add target flag if needed.
Support $gp in getRegBankFromRegClass().
Select G_GLOBAL_VALUE, specially handle case when
there are target flags attached by lowerCall.

Differential Revision: https://reviews.llvm.org/D62589

llvm-svn: 362210

efcd3c00

[NFC][Codegen] Add/sub constant-folding: add scalar tests too · 7c1ac826
Roman Lebedev authored May 31, 2019
```
Just for completeness.

llvm-svn: 362208
```
7c1ac826

[MIPS GlobalISel] Lower call for callee that is register · f4a6dd28

Petar Avramovic authored May 31, 2019

Lower call for callee that is register for MIPS32.
Register should contain callee function address.

Differential Revision: https://reviews.llvm.org/D62585

llvm-svn: 362204

f4a6dd28

[X86] Remove patterns for X86VSintToFP/X86VUintToFP+loadv4f32 to v2f64. · 31d00d80

Craig Topper authored May 31, 2019

These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits.
Similar to PR42079.

Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the
load pattern used in the instructions.

This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe
we should use VZMOVL for widened loads even when we don't need the upper bits
as zeroes?

llvm-svn: 362203

31d00d80

[X86] Add test cases for failure to use 128-bit masked vcvtdq2pd when load starts as v2i32. · cded5737
Craig Topper authored May 31, 2019
```
llvm-svn: 362202
```
cded5737
[X86] Add test cases for a volatile load shrinking bug involving cvtdq2pd. NFC · 67d43e07
Craig Topper authored May 31, 2019
```
Similar to PR42079

llvm-svn: 362201
```
67d43e07
[X86] Copy a test case from avx512-cvt.ll to avx512-cvt-widen.ll. NFC · cb0ad5ac
Craig Topper authored May 31, 2019
```
llvm-svn: 362200
```
cb0ad5ac

[X86] Remove avx512 isel patterns for fpextend+load. Prefer to only match fp extloads instead. · b79cc5f8

Craig Topper authored May 31, 2019

DAG combine will usually fold fpextend+load to an fp extload anyway. So the
256 and 512 patterns were probably unnecessary. The 128 bit pattern was special
in that it looked for a v4f32 load, but then used it in an instruction that
only loads 64-bits. This is bad if the load happens to be volatile. We could
probably make the patterns volatile aware, but that's more work for something
that's probably rare. The peephole pass might kick in and save us anyway. We
might also be able to fix this with some additional DAG combines.

This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be
used. Previously we looked for the unlikely vselect+fpextend+load.

llvm-svn: 362199

b79cc5f8