Commits · 3616891046e7f13a758e53dcc6fa73a7c3232b35 · Lorenzo Albano / LLVM bpEVL

Nov 19, 2018

[X86] Use compare with 0 to fill an element with sign bits when sign extending to v2i64 pre-sse4.1 · 36168910

Craig Topper authored Nov 19, 2018

Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter.

llvm-svn: 347181

36168910

[X86] Remove most of the SEXTLOAD Custom setOperationAction calls under... · 053f1eea

Craig Topper authored Nov 19, 2018

[X86] Remove most of the SEXTLOAD Custom setOperationAction calls under -x86-experimental-vector-widening-legalization.

Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts.

llvm-svn: 347180

053f1eea

Nov 18, 2018

[X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp conversions. · 7f92efa5
Simon Pilgrim authored Nov 18, 2018
```
llvm-svn: 347177
```
7f92efa5

[X86] Add custom type legalization for extending v4i8/v4i16->v4i64. · 0468c860

Craig Topper authored Nov 18, 2018

Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result.

When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq.

llvm-svn: 347176

0468c860

[X86][SSE] Add SimplifyDemandedVectorElts support for SSE splat-vector-shifts. · b31bdbd2
Simon Pilgrim authored Nov 18, 2018
```
SSE vector shifts only use the bottom 64-bits of the shift amount vector.

llvm-svn: 347173
```
b31bdbd2

[X86] Disable combineToExtendVectorInReg under... · 11d50948

Craig Topper authored Nov 18, 2018

[X86] Disable combineToExtendVectorInReg under -x86-experimental-vector-widening-legalization. Add custom type legalization for extends.

If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats.

This patch disables combineToExtendVectorInReg when we are using widening.

I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346.

I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend.

llvm-svn: 347172

11d50948

[X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an... · bc8148f7

Craig Topper authored Nov 18, 2018

[X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an extract_subvector, and a packuswb instruction.

Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54671

llvm-svn: 347171

bc8148f7

[DAG] add undef simplifications for select nodes · 8c0cd77b

Sanjay Patel authored Nov 18, 2018

Sadly, this duplicates (twice) the logic from InstSimplify. There
might be some way to at least share the DAG versions of the code, 
but copying the folds seems to be the standard method to ensure 
that we don't miss these folds. 

Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no 
way to ensure that we do these kinds of simplifications unless the 
code is repeated at node creation time and during combines.

There were other tests that would become worthless with this
improvement that I changed as pre-commits:
rL347161
rL347164
rL347165
rL347166
rL347167

I'm not sure how to salvage the remaining tests (diffs in this patch).
So the x86 tests verify that the new code is working as intended.
The AMDGPU test is actually similar to my motivating case: we have
some undef value that has survived to machine IR in an x86 test, and 
then it gets folded in some weird way, or we crash if we don't transfer
the undef flag. But we would have been better off never getting to that
point by doing these simplifications.

This will lead back to PR32023 someday...
https://bugs.llvm.org/show_bug.cgi?id=32023

llvm-svn: 347170

8c0cd77b

Remove unused variable. NFCI. · ec808cf5
Simon Pilgrim authored Nov 18, 2018
```
llvm-svn: 347169
```
ec808cf5

[X86][SSE] Split IsSplatValue into GetSplatValue and IsSplatVector · 50828c75

Simon Pilgrim authored Nov 18, 2018

Refactor towards making this recursive (necessary for PR38243 rotation splat detection).
IsSplatVector returns the original vector source of the splat and the splat index.
GetSplatValue returns the scalar splatted value as an extraction from IsSplatVector.

llvm-svn: 347168

50828c75

[X86][SSE] Relax IsSplatValue - remove the 'variable shift' limit on subtracts. · fec9f865
Simon Pilgrim authored Nov 18, 2018
```
Means we don't use the per-lane-shifts as much when we can cheaply use the older splat-variable-shifts.

llvm-svn: 347162
```
fec9f865
[SelectionDAG] simplify code; NFC · 42c22a1f
Sanjay Patel authored Nov 18, 2018
```
llvm-svn: 347160
```
42c22a1f

[X86][SSE] Use raw shuffle mask decode in SimplifyDemandedVectorEltsForTargetNode (PR39549) · cc1f5d24

Simon Pilgrim authored Nov 18, 2018

We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs.

llvm-svn: 347158

cc1f5d24

[WebAssembly] Add null streamer support · e0f8b9bf

Heejin Ahn authored Nov 18, 2018

Summary: Now `llc -filetype=null` works.

Reviewers: eush

Subscribers: dschuff, jgravelle-google, sbc100, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54660

llvm-svn: 347155

e0f8b9bf

[X86] Add -x86-experimental-vector-widening-legalization check to... · cd94a7c2

Craig Topper authored Nov 18, 2018

[X86] Add -x86-experimental-vector-widening-legalization check to combineSelect and combineSetCC to cover vXi16/vXi8 promotion without BWI.

I don't yet have any test cases for this, but its the right thing to do based on log file inspection.

llvm-svn: 347151

cd94a7c2

[X86] Rename WidenMaskArithmetic->PromoteMaskArithmetic since we usually use... · b03f80a2

Craig Topper authored Nov 18, 2018

[X86] Rename WidenMaskArithmetic->PromoteMaskArithmetic since we usually use widen to refer to adding elements not making elements larger. NFC

llvm-svn: 347150

b03f80a2

[X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends... · f56a5751

Craig Topper authored Nov 18, 2018

[X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends from i8 or smaller without SSE4.1. Prefer to shrink the mul instead.

The zero extend will require two stages of unpacks to implement. So its better to shrink the multiply using pmullw and then extend that result back to v4i32 using a single unpack.

llvm-svn: 347149

f56a5751

Vedant Kumar authored Nov 18, 2018

Fix all of the missing debug location errors in CVP found by debugify.

This includes the missing-location-after-udiv-truncation case described
in llvm.org/PR38178.

llvm-svn: 347147

35f504c1

Nov 17, 2018

Fix bot failure from r347145 · 5b9bb25c

Teresa Johnson authored Nov 17, 2018

The #if check around the statistics computation gave an error about
the statistic being an unused variable. Instead, guard with
AreStatisticsEnabled().

llvm-svn: 347146

5b9bb25c

[ThinLTO] Add some stats for read only variable internalization · 8c1915cc

Teresa Johnson authored Nov 17, 2018

Summary:
Follow up to D49362 ([ThinLTO] Internalize read only globals). Add a
statistic on the number of read only variables (only counting live
variables since dead variables will be dropped anyway).

Reviewers: evgeny777

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54642

llvm-svn: 347145

8c1915cc

[X86] Add support for matching PACKUSWB from a v64i8 shuffle. · 0438d791
Craig Topper authored Nov 17, 2018
```
llvm-svn: 347143
```
0438d791
Move BuryPointer from Clang to LLVM for use in other LLVM tools · ef543381
David Blaikie authored Nov 17, 2018
```
Specifically planning to use this in llvm-symbolizer to remove the cost
of cleanup there.

llvm-svn: 347140
```
ef543381

llvm-symbolizer: Avoid calling getFromOffset when the index entry is already available · 81959a27

David Blaikie authored Nov 17, 2018

Especially for symbolizer it can be efficient to have to search through
the entire index when it isn't needed - llvm-symbolizer looks up only a
few CUs & already has an index available in getUnitForEntry, once it's
passed down to DWARFUnitHeader::extract then there's no need for it to
call getFromOffset.

llvm-svn: 347134

81959a27

[X86] Don't extend v32i8 multiplies to v32i16 with avx512bw and prefer-vector-width=256. · dd61f116
Craig Topper authored Nov 17, 2018
```
llvm-svn: 347131
```
dd61f116

Reverted r347092 due to the following build fails: · 6a5d5ac4

Vyacheslav Zakharin authored Nov 17, 2018

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/8662
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/26263

llvm-svn: 347129

6a5d5ac4

[X86] Use getUnpackl/getUnpackh instead of hardcoding a shuffle mask. · b05ea28f
Craig Topper authored Nov 17, 2018
```
llvm-svn: 347127
```
b05ea28f
Use llvm::copy. NFC · 75709329
Fangrui Song authored Nov 17, 2018
```
llvm-svn: 347126
```
75709329
DAG combiner: fold (select, C, X, undef) -> X · 0ff7c830
Stanislav Mekhanoshin authored Nov 16, 2018
```
Differential Revision: https://reviews.llvm.org/D54646

llvm-svn: 347110
```
0ff7c830

Nov 16, 2018

[X86] Add custom promotion of narrow fp_to_uint/fp_to_sint operations under... · ee0333b4

Craig Topper authored Nov 16, 2018

[X86] Add custom promotion of narrow fp_to_uint/fp_to_sint operations under -x86-experimental-vector-widening-legalization.

This tries to force the result type to vXi32 followed by a truncate. This can help avoid scalarization that would otherwise occur.

There's some annoying examples of an avx512 truncate instruction followed by a packus where we should really be able to just use one truncate. But overall this is still a net improvement.

llvm-svn: 347105

ee0333b4

[X86] Qualify part of the masked gather handling in ReplaceNodeResults with a... · 87bc07b3

Craig Topper authored Nov 16, 2018

[X86] Qualify part of the masked gather handling in ReplaceNodeResults with a getTypeAction call to know if we can use default legalization.

If we managed to switch to -x86-experimental-vector-widening-legalization this block can be removed.

llvm-svn: 347100

87bc07b3

[SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch with · 2e3e224e

Fedor Sergeev authored Nov 16, 2018

We need to control exponential behavior of loop-unswitch so we do not get
run-away compilation.

Suggested solution is to introduce a multiplier for an unswitch cost that
makes cost prohibitive as soon as there are too many candidates and too
many sibling loops (meaning we have already started duplicating loops
by unswitching).

It does solve the currently known problem with compile-time degradation
(PR 39544).

Tests are built on top of a recently implemented CHECK-COUNT-<num>
FileCheck directives.

Reviewed By: chandlerc, mkazantsev
Differential Revision: https://reviews.llvm.org/D54223

llvm-svn: 347097

2e3e224e

[X86] Remove a branch on SSE4.1 from LowerLoad · 567aaeb4

Craig Topper authored Nov 16, 2018

We should be able to use getExtendInVec with or without sse4.1 to produce a SIGN_EXTEND_VECTOR_INREG.

llvm-svn: 347095

567aaeb4

[LegalizeVectorOps] After custom legalizing an extending load or a truncating... · 9e970542

Craig Topper authored Nov 16, 2018

[LegalizeVectorOps] After custom legalizing an extending load or a truncating store, make sure the custom code is also legal.

For example, on X86 we emit a sign_extend_vector_inreg from LowerLoad and without sse4.1 this node will need further legalization. Previously this sign_extend_vector_inreg was being custom lowered during DAG legalization instead of vector op legalization.

Unfortunately, this doesn't seem to matter for the output of any existing lit tests.

llvm-svn: 347094

9e970542

[X86] In LowerLoad, fix assert messages and rename a variable that use Zize instead of Size. NFC · 7fff9a9a
Craig Topper authored Nov 16, 2018
```
llvm-svn: 347093
```
7fff9a9a
Preprocessing support in tablegen. · dd0a1fdf
Vyacheslav Zakharin authored Nov 16, 2018
```
Differential Revision: https://reviews.llvm.org/D53840

llvm-svn: 347092
```
dd0a1fdf

AArch64: Emit a call frame instruction for the shadow call stack register. · 52702446

Peter Collingbourne authored Nov 16, 2018

When unwinding past a function that uses shadow call stack, we must
subtract 8 from the value of the x18 register. This patch causes us
to emit a call frame instruction that causes that to happen.

Differential Revision: https://reviews.llvm.org/D54609

llvm-svn: 347089

52702446

[MSP430] Add RTLIB::[SRL/SRA/SHL]_I32 lowering to EABI lib calls · e5cb1c35
Anton Korobeynikov authored Nov 16, 2018
```
Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54626

llvm-svn: 347080
```
e5cb1c35

[X86] Disable Condbr_merge pass · 3a381757

Rong Xu authored Nov 16, 2018

Disable Condbr_merge pass for now due to PR39658.
Will reenable the pass once the bug is fixed.

llvm-svn: 347079

3a381757

Revert "[PowerPC] Make no-PIC default to match GCC - LLVM" · 9004444d
Stefan Pintilie authored Nov 16, 2018
```
This reverts commit r347069

llvm-svn: 347076
```
9004444d

[MSP430] Use R_MSP430_16_BYTE type for FK_Data_2 fixup · 883c7095

Anton Korobeynikov authored Nov 16, 2018

Linker fails to link example like this (simplified case from newlib
sources):

$ cat test.c

extern const char _ctype_b[];
struct _t { char *ptr; };
struct _t T = { ((char *) _ctype_b + 3) };
$ cat ctype.c

char _ctype_b[4] = { 0, 0, 0, 0 };
LD: test.o:(.data+0x0): warning: internal error: unsupported relocation error

We also follow gnu toolchain here, where 2-byte relocation mapped to
R_MSP430_16_BYTE, instead of R_MSP430_16.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54620

llvm-svn: 347074

883c7095