Commits · b9d01aa29e5d0aa433c2fc62ace709fe69c45ceb · Lorenzo Albano / LLVM bpEVL

Jul 11, 2018

[Power9] Add remaining __flaot128 builtin support for FMA round to odd · b9d01aa2

Stefan Pintilie authored Jul 11, 2018

Implement this as it is done on GCC:

__float128 a, b, c, d;
a = __builtin_fmaf128_round_to_odd (b, c, d);         // generates xsmaddqpo
a = __builtin_fmaf128_round_to_odd (b, c, -d);        // generates xsmsubqpo
a = - __builtin_fmaf128_round_to_odd (b, c, d);       // generates xsnmaddqpo
a = - __builtin_fmaf128_round_to_odd (b, c, -d);      // generates xsnmsubpqp

Differential Revision: https://reviews.llvm.org/D48218

llvm-svn: 336754

b9d01aa2

[ARM] Treat cmn immediates as legal in isLegalICmpImmediate. · d2c73923

Eli Friedman authored Jul 10, 2018

The original code attempted to do this, but the std::abs() call didn't
actually do anything due to implicit type conversions.  Fix the type
conversions, and perform the correct check for negative immediates.

This probably has very little practical impact, but it's worth fixing
just to avoid confusion in the future, I think.

Differential Revision: https://reviews.llvm.org/D48907

llvm-svn: 336742

d2c73923

[X86] Teach X86InstrInfo::commuteInstructionImpl to use MOVSD/MOVSS for BLEND... · 860ab496

Craig Topper authored Jul 10, 2018

[X86] Teach X86InstrInfo::commuteInstructionImpl to use MOVSD/MOVSS for BLEND under optsize when the immediate allows it.

Isel currently emits movss/movsd a lot of the time and an accidental double commute turns it into a blend.

Ideally we'd select blend directly in isel under optspeed and not rely on the double commute to create blend.

llvm-svn: 336731

860ab496

Jul 10, 2018

[GlobalISel][X86_64] Support for G_SITOFP · 48ca0550
Alexander Ivchenko authored Jul 10, 2018
```
The instruction selection is automatically handled by tablegen

llvm-svn: 336703
```
48ca0550
[DAGCombiner] Add special case fast paths for udiv x,1 and udiv x,-1 · 4cb46093
Simon Pilgrim authored Jul 10, 2018
```
udiv x,-1 was going down the (slow) BuildUDIV route resulting in unnecessary shifts.

llvm-svn: 336701
```
4cb46093

AMDGPU: Make hidden argument metadata consistent with · f0badd5a

Konstantin Zhuravlyov authored Jul 10, 2018

amdgpu-implicitarg-num-bytes attribute

Differential Revision: https://reviews.llvm.org/D49096

llvm-svn: 336697

f0badd5a

[X86] Add srem/udiv/urem by constant tests · 9bd9fef4
Simon Pilgrim authored Jul 10, 2018
```
Match the tests in combine-sdiv.ll

llvm-svn: 336694
```
9bd9fef4
[WebAssembly] Add missing a few {{$}}s to a test · 9ef850b8
Heejin Ahn authored Jul 10, 2018
```
llvm-svn: 336691
```
9ef850b8
AMDGPU/NFC: Fix typo in test name · 75024cf4
Konstantin Zhuravlyov authored Jul 10, 2018
```
hsa-metadata-enqueu-kernel.ll ->
hsa-metadata-enqueue-kernel.ll

llvm-svn: 336689
```
75024cf4

[Hexagon] Change .mir testcase to make sure function is not in SSA form · c87ecf25

Krzysztof Parzyszek authored Jul 10, 2018

If a machine function satisfies SSA, the IsSSA property is assumed even
if the pass to be executed runs after existing from SSA. If the pass
output then does not conform to SSA, a verifier error will be flagged
(with expensive checks enabled).

llvm-svn: 336682

c87ecf25

Reapply "AMDGPU: Force inlining if LDS global address is used" · a680199a
Matt Arsenault authored Jul 10, 2018
```
This reverts commit r336623

llvm-svn: 336675
```
a680199a

[Hexagon] Add implicit uses even when untied explicit uses are present · c052451a

Krzysztof Parzyszek authored Jul 10, 2018

An explicit untied use is not sufficient to maintain liveness of a
register redefined in a predicated instruction. For example
  %1 = COPY %0
  ...
  %1 = A2_paddif %2, %1, 1
could become
  $r1 = COPY $r0
  ...
  $r1 = A2_paddif $p0, $r1, 1
and later
  $r1 = COPY $r0                ;; this is not really dead!
  ...
  $r1 = A2_paddif $p0, $r0, 1

llvm-svn: 336662

c052451a

[X86] Fast-isel tests for lowered truncation intrinsics · 89c919c2

Mikhail Dvoretckii authored Jul 10, 2018

This patch adds fast-isel tests for the IR patterns produced for truncation
intrinsics in rC336643.

Differential Revision: https://reviews.llvm.org/D48822

llvm-svn: 336645

89c919c2

[X86][SSE] Prefer BLEND(SHL(v,c1),SHL(v,c2)) over MUL(v, c3) · d32ca2c0

Simon Pilgrim authored Jul 10, 2018

Now that rL336250 has landed, we should prefer 2 immediate shifts + a shuffle blend over performing a multiply. Despite the increase in instructions, this is quicker (especially for slow v4i32 multiplies), avoid loads and constant pool usage. It does mean however that we increase register pressure. The code size will go up a little but by less than what we save on the constant pool data.

This patch also adds support for v16i16 to the BLEND(SHIFT(v,c1),SHIFT(v,c2)) combine, and also prevents blending on pre-SSE41 shifts if it would introduce extra blend masks/constant pool usage.

Differential Revision: https://reviews.llvm.org/D48936

llvm-svn: 336642

d32ca2c0

[X86] Regenerate vector-shuffle-512-v8.ll so the script will merge the 32 and... · 5fd020c0
Craig Topper authored Jul 10, 2018
```
[X86] Regenerate vector-shuffle-512-v8.ll so the script will merge the 32 and 64 bit checks together. NFC

llvm-svn: 336641
```
5fd020c0

[X86] Correct vfixupimm load patterns to look for an integer load, not a... · 866a377e

Craig Topper authored Jul 10, 2018

[X86] Correct vfixupimm load patterns to look for an integer load, not a floating point load bitcasted to integer.

DAG combine wouldn't let a floating point load bitcasted to integer exist. It would just be an integer load.

llvm-svn: 336626

866a377e

[X86] Add test cases that show failure to fold load into vfixupimm... · 59fd2f4c

Craig Topper authored Jul 10, 2018

[X86] Add test cases that show failure to fold load into vfixupimm instructions due to bad isel pattern.

llvm-svn: 336625

59fd2f4c

Revert "AMDGPU: Force inlining if LDS global address is used" · 688e7522
Vlad Tsyrklevich authored Jul 10, 2018
```
This reverts commit r336587, it was causing test failures on the
sanitizer bots.

llvm-svn: 336623
```
688e7522

[WebAssembly] Support for binary atomic RMW instructions · fed7382e

Heejin Ahn authored Jul 09, 2018

Summary:
This adds support for binary atomic read-modify-write instructions:
add, sub, and, or, xor, and xchg.

This does not yet support translations of some of LLVM IR atomicrmw
instructions (nand, max, min, umax, and umin) that do not have a direct
counterpart in wasm instructions.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49088

llvm-svn: 336615

fed7382e

Jul 09, 2018

Fix line endings. NFCI. · 017c68c1
Simon Pilgrim authored Jul 09, 2018
```
llvm-svn: 336602
```
017c68c1

[Power9] Add __float128 builtins for Rounding Operations · 133acb22

Stefan Pintilie authored Jul 09, 2018

Added __float128 support for a number of rounding operations:

trunc
rint
nearbyint
round
floor
ceil

Differential Revision: https://reviews.llvm.org/D48415

llvm-svn: 336601

133acb22

[WebAssembly] Improve readability of load/stores and tests. NFC. · d31bc986

Heejin Ahn authored Jul 09, 2018

Summary:
- Changed variable/function names to be more consistent
- Improved comments in test files
- Added more tests
- Fixed a few typos
- Misc. cosmetic changes

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49087

llvm-svn: 336598

d31bc986

[Power9] [LLVM] Add __float128 support for trunc to double round to odd · 58e3e0a8

Stefan Pintilie authored Jul 09, 2018

Add support for this builtin:
double builtin_truncf128_round_to_odd(float128)

Differential Revision: https://reviews.llvm.org/D48483

llvm-svn: 336595

58e3e0a8

RenameIndependentSubregs: Fix handling of undef tied operands · 7139dea6

Mark Searles authored Jul 09, 2018

Ensure that, if updating a tied operand pair, to only update
that pair.

Differential Revision: https://reviews.llvm.org/D49052

llvm-svn: 336593

7139dea6

[globalisel][irtranslator] Add support for atomicrmw and (strong) cmpxchg · 9481399c

Daniel Sanders authored Jul 09, 2018

Summary:
This patch adds support for the atomicrmw instructions and the strong
cmpxchg instruction to the IRTranslator.

I've left out weak cmpxchg because LangRef.rst isn't entirely clear on what
difference it makes to the backend. As far as I can tell from the code, it
only matters to AtomicExpandPass which is run at the LLVM-IR level.

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, javed.absar

Reviewed By: qcolombet

Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D40092

llvm-svn: 336589

9481399c

AMDGPU: Force inlining if LDS global address is used · 40cb6cab

Matt Arsenault authored Jul 09, 2018

These won't work for the forseeable future. These aren't allowed
from OpenCL, but IPO optimizations can make them appear.

Also directly set the attributes on functions, regardless
of the linkage rather than cloning functions like before.

llvm-svn: 336587

40cb6cab

[X86][TLI] DAGCombine: Unfold variable bit-clearing mask to two shifts. · 5ccae175

Roman Lebedev authored Jul 09, 2018

Summary:
This adds a reverse transform for the instcombine canonicalizations
that were added in D47980, D47981.

As discussed later, that was worse at least for the code size,
and potentially for the performance, too.

https://rise4fun.com/Alive/Zmpl

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: spatel

Subscribers: reames, llvm-commits

Differential Revision: https://reviews.llvm.org/D48768

llvm-svn: 336585

5ccae175

[Power9] Add __float128 builtins for Round To Odd · 83a5fe14

Stefan Pintilie authored Jul 09, 2018

GCC has builtins for these round to odd instructions:

__float128 __builtin_sqrtf128_round_to_odd (__float128)
__float128 __builtin_{add,sub,mul,div}f128_round_to_odd (__float128, __float128)
__float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128)

Differential Revision: https://reviews.llvm.org/D47550

llvm-svn: 336578

83a5fe14

[X86] In combineFMA, make sure we bitcast the result of isFNEG back the... · 47170b31

Craig Topper authored Jul 09, 2018

[X86] In combineFMA, make sure we bitcast the result of isFNEG back the expected type before creating the new FMA node.

Previously, we were creating malformed SDNodes, but nothing noticed because the type constraints prevented isel from noticing.

llvm-svn: 336566

47170b31

[X86][AVX] Regenerate AVX1 fast-isel tests. · d0706592
Simon Pilgrim authored Jul 09, 2018
```
Let the update script merge 32/64 tests where possible

llvm-svn: 336565
```
d0706592

[Power9] Add __float128 support for compare operations · 3d76326d

Stefan Pintilie authored Jul 09, 2018

Added handling for the select f128.

Differential Revision: https://reviews.llvm.org/D48294

llvm-svn: 336548

3d76326d

Jul 08, 2018

[X86] Enhance combineFMA to look for FNEG behind an EXTRACT_VECTOR_ELT. · 9e17073c
Craig Topper authored Jul 08, 2018
```
llvm-svn: 336514
```
9e17073c

[X86][SSE] Combine v16i8 SHL by constants to multiplies · 2eced71e

Simon Pilgrim authored Jul 08, 2018

Pre-AVX512 (which can perform a quick extend/shift/truncate), extending to 2 v8i16 for the PMULLW and then truncating is more performant than relying on the generic PBLENDVB vXi8 shift path and uses a similar amount of mask constant pool data.

Differential Revision: https://reviews.llvm.org/D48963

llvm-svn: 336513

2eced71e

[X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types. · fdf3f1ff

Craig Topper authored Jul 08, 2018

This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma.

I had to add new rounding mode CodeGenOnly instructions to support isel when we can't find a movss to grab the upper bits from to use the b_Int instruction.

Fast-isel tests have been updated to match new clang codegen.

We are currently having trouble folding fneg into the new intrinsic. I'm going to correct that in a follow up patch to keep the size of this one down.

A future patch will also remove the old intrinsics.

llvm-svn: 336506

fdf3f1ff

[X86] Use a rounding mode other than 4 in the scalar fma intrinsic fast-isel... · d679d01a
Craig Topper authored Jul 08, 2018
```
[X86] Use a rounding mode other than 4 in the scalar fma intrinsic fast-isel tests to match clang test cases.

llvm-svn: 336505
```
d679d01a

Jul 07, 2018
- [X86] Regenerate PR14088 test. NFCI. · 2a434453
  Simon Pilgrim authored Jul 07, 2018
```
llvm-svn: 336496
```
  2a434453
- [DAGCombiner] Add EXTRACT_SUBVECTOR to SimplifyDemandedVectorElts · c1d19440
  Simon Pilgrim authored Jul 07, 2018
```
As discussed on PR37989, this patch adds EXTRACT_SUBVECTOR handling to TargetLowering::SimplifyDemandedVectorElts and calls it from DAGCombiner::visitEXTRACT_SUBVECTOR.

Differential Revision: https://reviews.llvm.org/D48825

llvm-svn: 336490
```
  c1d19440
- NFC - Typo fixes in X86 flags-copy-lowering.mir test · c22fd359
  Gabor Buella authored Jul 07, 2018
```
Differential Revision: https://reviews.llvm.org/D48934

llvm-svn: 336484
```
  c22fd359
- [MachineOutliner] Add missing liveness tracking info in MIR test. · 7382cf82
  Yvan Roux authored Jul 07, 2018
```
This should bring the bots back to green state.

llvm-svn: 336482
```
  7382cf82
Jul 06, 2018
- Revert 336426 (and follow-ups 428, 440), it very likely caused PR38084. · 038dbf3c
  Nico Weber authored Jul 06, 2018
```
llvm-svn: 336453
```
  038dbf3c