Commits · 30111c786f7cf49197fdd9db01e3a6def57b3cef · Roger Ferrer / llvm-epi

May 26, 2019

[SimplifyCFG] Run ReduceSwitchRange unconditionally, generalize · 30111c78

Shawn Landden authored May 26, 2019

Rather than gating on "isSwitchDense" (resulting in necessesarily
sparse lookup tables even when they were generated), always run
this quite cheap transform.

This transform is useful not just for generating tables.
LowerSwitch also wants this: read LowerSwitch.cpp:257.

Be careful to not generate worse code, by introducing a
SubThreshold heuristic.

Instead of just sorting by signed, generalize the finding of the
best base.

And now that it is run unconditionally, do not replicate its
functionality in SwitchToLookupTable (which could use a Sub
when having a hole is smaller, hence the SubThreshold
heuristic located in a single place).
This simplifies SwitchToLookupTable, and fixes
some ugly corner cases due to the use of signed numbers,
such as a table containing i16 32768 and 32769, of which
32769 would be interpreted as -32768, and now the code thinks
the table is size 65536.

(We still use unconditional subtraction when building a single-register mask,
but I think this whole block should go when the more general sparse
map is added, which doesn't leave empty holes in the table.)

And the reason test4 and test5 did not trigger was documented wrong:
it was because they were not considered sufficiently "dense".

Also, fix generation of invalid LLVM-IR: shl by bit-width.

llvm-svn: 361727

30111c78

[SimpligyCFG] NFC, remove GCD that was only used for powers of two · 444eaaf1

Shawn Landden authored May 26, 2019

and replace with an equilivent countTrailingZeros.

GCD is much more expensive than this, with repeated division.

This depends on D60823

llvm-svn: 361726

444eaaf1

[SimplifyCFG] NFC, update Switch tests to HEAD so I can see if my changes change anything · 50c73a04
Shawn Landden authored May 26, 2019
```
Also add baseline tests to show effect of later patches.

llvm-svn: 361725
```
50c73a04

[Support] make countLeadingZeros() and countTrailingZeros() return unsigned · b7cc093d

Shawn Landden authored May 26, 2019

This matches countLeadingOnes() and countTrailingOnes(), and
APInt's countLeadingZeros() and countTrailingZeros().

(as well as __builtin_clzll())

llvm-svn: 361724

b7cc093d

[ValueTracking] Base computeOverflowForUnsignedMul() on ConstantRange code; NFCI · d0f13e61

Nikita Popov authored May 26, 2019

The implementation in ValueTracking and ConstantRange are equally
powerful, reuse the one in ConstantRange, which will make this easier
to extend.

llvm-svn: 361723

d0f13e61

gn build: Merge r361664 · 7228b508
Nico Weber authored May 26, 2019
```
llvm-svn: 361722
```
7228b508

[InstCombine] Refactor OptimizeOverflowCheck; NFCI · 39f2bebf

Nikita Popov authored May 26, 2019

Extract method to compute overflow based on binop and signedness,
and then make the result handling code generic. This extends the
always-overflow handling to signed muls, but has currently no effect,
as we don't compute always overflow for them (thus NFC).

llvm-svn: 361721

39f2bebf

[InstCombine] Remove OverflowCheckFlavor; NFC · 352f5987

Nikita Popov authored May 26, 2019

Instead pass binary op and signedness. The extra enum only makes
things more complicated in this case.

llvm-svn: 361720

352f5987

[ARM] Select fp16 fma · 0dbafe19

David Green authored May 26, 2019

This adds a pattern for fma, similar to the float and double patterns.

Differential Revision: https://reviews.llvm.org/D62330

llvm-svn: 361719

0dbafe19

[ARM] Select a number of fp16 rounding functions · 21542cd6

David Green authored May 26, 2019

This add patterns for fp16 round and ceil etc. Same as the float and double
patterns.

Differential Revision: https://reviews.llvm.org/D62326

llvm-svn: 361718

21542cd6

[ARM] Promote various fp16 math intrinsics · c9f4b7d2

David Green authored May 26, 2019

Promote a number of fp16 math intrinsics to float, so that the relevant float
math routines can be used. Copysign is expanded so as to be handled in-place.

Differential Revision: https://reviews.llvm.org/D62325

llvm-svn: 361717

c9f4b7d2

[X86][AVX] combineBitcastvxi1 - peek through bitops to determine size of original vector · 58a8541d

Simon Pilgrim authored May 26, 2019

We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well.

There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well.

llvm-svn: 361716

58a8541d

[ARM] Select fp16 fabs · 2881325b

David Green authored May 26, 2019

This adds a pattern for the fabs intrinsic, the same as float and double.

Differential Revision: https://reviews.llvm.org/D62324

llvm-svn: 361715

2881325b

[ARM] Select fp16 fsqrt · aeade651

David Green authored May 26, 2019

This adds a pattern for the sqrt intrinsic, the same as float and double.

Differential Revision: https://reviews.llvm.org/D62322

llvm-svn: 361714

aeade651

[ARM] Promote fp16 frem · caf8a11b

David Green authored May 26, 2019

Promote fp16 frem operations on ARM to floats so they call fmodf.

Differential Revision: https://reviews.llvm.org/D62321

llvm-svn: 361713

caf8a11b

[ARM] Add some base fullfp16 tests. NFC · 1c1e2ca0
David Green authored May 26, 2019
```
llvm-svn: 361712
```
1c1e2ca0

[PowerPC] Add missing R_PPC_* relocation types · 603ca511

Fangrui Song authored May 26, 2019

While people mostly care about 64-bit, some systems need basic lib32
support. The plan is to make lld (see PR40888) capable of linking some
applications (PR40888).

llvm-svn: 361711

603ca511

[Driver][RISCV] Simplify. NFC · f2912065
Fangrui Song authored May 26, 2019
```
llvm-svn: 361710
```
f2912065

[Driver] Update handling of c++ and runtime directories · 2db79ef3

Petr Hosek authored May 26, 2019

This is a follow up to r361432 and r361504 which addresses issues
introduced by those changes. Specifically, it avoids duplicating
file and runtime paths in case when the effective triple is the
same as the cannonical one. Furthermore, it fixes the broken multilib
setup in the Fuchsia driver and deduplicates some of the code.

Differential Revision: https://reviews.llvm.org/D62442

llvm-svn: 361709

2db79ef3

Add missing newline at end of file · d4a9cae9
Duncan P. N. Exon Smith authored May 25, 2019
```
llvm-svn: 361708
```
d4a9cae9

[SimplifyCFG] Added condition assumption for unreachable blocks · 0290a77a

David Bolvansky authored May 25, 2019

Summary: PR41688

Reviewers: spatel, efriedma, craig.topper, hfinkel, reames

Reviewed By: hfinkel

Subscribers: javed.absar, dmgreen, fhahn, hfinkel, reames, nikic, lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61409

llvm-svn: 361707

0290a77a

May 25, 2019

[X86] lowerBuildVectorToBitOp - support build_vector(shift()) -> shift(build_vector(),C) · 40fa52b1
Simon Pilgrim authored May 25, 2019
```
Commonly occurs in sign-extension cases

llvm-svn: 361706
```
40fa52b1

[LLVM-C] Add Accessor for Mach-O Universal Binary Slices · b0fd12b6

Robert Widmann authored May 25, 2019

Summary: Allow for retrieving an object file corresponding to an architecture-specific slice in a Mach-O universal binary file.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60378

llvm-svn: 361705

b0fd12b6

[X86] Combine fminnum/fmaxnum with non-nan operand to fmin/fmax · d87eceda

Nikita Popov authored May 25, 2019

If we have a known non-nan operand, place it in the second operand
of fmin/fmax that is returned if either operand is nan.

Differential Revision: https://reviews.llvm.org/D62448

llvm-svn: 361704

d87eceda

[LVI][CVP] Add support for saturating add/sub · 6bb5041e

Nikita Popov authored May 25, 2019

Adds support for the uadd.sat family of intrinsics in LVI, based on
ConstantRange methods from D60946.

Differential Revision: https://reviews.llvm.org/D62447

llvm-svn: 361703

6bb5041e

[X86][SSE] vector-sext - cleanup prefix lists · 34d5a74b
Simon Pilgrim authored May 25, 2019
```
Add X32-SSE common prefix to merge some checks

llvm-svn: 361702
```
34d5a74b

[SelectionDAG] define binops as a superset of commutative binops · 3f0905e4

Sanjay Patel authored May 25, 2019

The test diffs show improved vector narrowing for integer min/max opcodes because
those were all absent from the list. I'm not sure if we can expose functional diffs
for all of the moved/added opcodes though.

It seems like we are missing an AVX512 opportunity to use 256-bit ops in place of
512-bit ops on some tests/targets, but I think that can be a follow-up.

Preliminary steps to make sure the callers are not misusing these queries:
rL361268
rL361547

Differential Revision: https://reviews.llvm.org/D62191

llvm-svn: 361701

3f0905e4

[X86] Add tests for min/maxnum with const operand; NFC · c9de92ee
Nikita Popov authored May 25, 2019
```
llvm-svn: 361700
```
c9de92ee
[LoopVectorize] Fix test by regenerating checks · 3c7edb2d
Nikita Popov authored May 25, 2019
```
llvm-svn: 361699
```
3c7edb2d

[CVP] Remove unnecessary checks for empty GNWR; NFC · 8b1fa076

Nikita Popov authored May 25, 2019

The guaranteed no-wrap region is never empty, it always contains at
least zero, so these optimizations don't ever apply.

To make this more obviously true, replace the conversative return
in makeGNWR with an assertion.

llvm-svn: 361698

8b1fa076

[NFC] Make tests more robust for new optimizations · 21498118
David Bolvansky authored May 25, 2019
```
llvm-svn: 361697
```
21498118

[SelectionDAG] soften assertion when legalizing narrow vector FP ops · 91131b65

Sanjay Patel authored May 25, 2019

The test based on PR42010:
https://bugs.llvm.org/show_bug.cgi?id=42010
...may show an inaccuracy for PPC's target defs, but we should not
be so aggressive with an assert here. There's no telling what out-of-tree
targets look like.

llvm-svn: 361696

91131b65

[NFC] Update test checks · bb76cf0f
David Bolvansky authored May 25, 2019
```
llvm-svn: 361695
```
bb76cf0f
[CVP] Add tests for saturating add/sub ranges; NFC · 9a33dc9f
Nikita Popov authored May 25, 2019
```
llvm-svn: 361694
```
9a33dc9f

[LVI][CVP] Calculate with.overflow result range · 024b18ac

Nikita Popov authored May 25, 2019

In LVI, calculate the range of extractvalue(op.with.overflow(%x, %y), 0)
as the range of op(%x, %y). This is mainly useful in conjunction with
D60650: If the result of the operation is extracted in a branch guarded
against overflow, then the value of %x will be appropriately constrained
and the result range of the operation will be calculated taking that
into account.

Differential Revision: https://reviews.llvm.org/D60656

llvm-svn: 361693

024b18ac

[LVI] Extract helper for binary range calculations; NFC · 17367b0d
Nikita Popov authored May 25, 2019
```
llvm-svn: 361692
```
17367b0d

[X86FixupLEAs] Turn optIncDec into a generic two address LEA optimizer. Support LEA64_32r properly. · 46e5052b

Craig Topper authored May 25, 2019

INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags.

This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg.

One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input.

Differential Revision: https://reviews.llvm.org/D61472

llvm-svn: 361691

46e5052b

[X86] Add zero idioms to the haswell, broadwell, and skylake schedule models.... · 4b08fcde

Craig Topper authored May 25, 2019

[X86] Add zero idioms to the haswell, broadwell, and skylake schedule models. Add 256-bit fp xor to sandybridge zero idioms

This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate.

Differential Revision: https://reviews.llvm.org/D62360

llvm-svn: 361690

4b08fcde

[X86][llvm-mca] Add zero idiom tests for Intel CPUs. NFC · af6c9df1
Craig Topper authored May 25, 2019
```
This pre-commits tests for D62360

llvm-svn: 361689
```
af6c9df1

Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for... · 3b937374

Peter Collingbourne authored May 25, 2019

Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence."

Broke sanitizer bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio

llvm-svn: 361688

3b937374