Commits · 3616891046e7f13a758e53dcc6fa73a7c3232b35 · Lorenzo Albano / LLVM bpEVL

Nov 19, 2018

[X86] Use compare with 0 to fill an element with sign bits when sign extending to v2i64 pre-sse4.1 · 36168910

Craig Topper authored Nov 19, 2018

Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter.

llvm-svn: 347181

36168910

[X86] Remove most of the SEXTLOAD Custom setOperationAction calls under... · 053f1eea

Craig Topper authored Nov 19, 2018

[X86] Remove most of the SEXTLOAD Custom setOperationAction calls under -x86-experimental-vector-widening-legalization.

Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts.

llvm-svn: 347180

053f1eea

Nov 18, 2018

[X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp conversions. · 7f92efa5
Simon Pilgrim authored Nov 18, 2018
```
llvm-svn: 347177
```
7f92efa5

[X86] Add custom type legalization for extending v4i8/v4i16->v4i64. · 0468c860

Craig Topper authored Nov 18, 2018

Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result.

When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq.

llvm-svn: 347176

0468c860

[X86] Add a 32-bit command line with only sse2 to vector-sext.ll and... · 950f3842

Craig Topper authored Nov 18, 2018

[X86] Add a 32-bit command line with only sse2 to vector-sext.ll and vector-sext.ll to show some of the scalarized load sequences without 64-bit scalar support.

Some of these sequeces look pretty bad since we have to copy the sign bit from a 32 bit register to a 64 bit register to finish a sign extend.

llvm-svn: 347175

950f3842

[X86][SSE] Add SimplifyDemandedVectorElts support for SSE splat-vector-shifts. · b31bdbd2
Simon Pilgrim authored Nov 18, 2018
```
SSE vector shifts only use the bottom 64-bits of the shift amount vector.

llvm-svn: 347173
```
b31bdbd2

[X86] Disable combineToExtendVectorInReg under... · 11d50948

Craig Topper authored Nov 18, 2018

[X86] Disable combineToExtendVectorInReg under -x86-experimental-vector-widening-legalization. Add custom type legalization for extends.

If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats.

This patch disables combineToExtendVectorInReg when we are using widening.

I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346.

I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend.

llvm-svn: 347172

11d50948

[X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an... · bc8148f7

Craig Topper authored Nov 18, 2018

[X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an extract_subvector, and a packuswb instruction.

Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54671

llvm-svn: 347171

bc8148f7

[DAG] add undef simplifications for select nodes · 8c0cd77b

Sanjay Patel authored Nov 18, 2018

Sadly, this duplicates (twice) the logic from InstSimplify. There
might be some way to at least share the DAG versions of the code, 
but copying the folds seems to be the standard method to ensure 
that we don't miss these folds. 

Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no 
way to ensure that we do these kinds of simplifications unless the 
code is repeated at node creation time and during combines.

There were other tests that would become worthless with this
improvement that I changed as pre-commits:
rL347161
rL347164
rL347165
rL347166
rL347167

I'm not sure how to salvage the remaining tests (diffs in this patch).
So the x86 tests verify that the new code is working as intended.
The AMDGPU test is actually similar to my motivating case: we have
some undef value that has survived to machine IR in an x86 test, and 
then it gets folded in some weird way, or we crash if we don't transfer
the undef flag. But we would have been better off never getting to that
point by doing these simplifications.

This will lead back to PR32023 someday...
https://bugs.llvm.org/show_bug.cgi?id=32023

llvm-svn: 347170

8c0cd77b

Remove unused variable. NFCI. · ec808cf5
Simon Pilgrim authored Nov 18, 2018
```
llvm-svn: 347169
```
ec808cf5

[X86][SSE] Split IsSplatValue into GetSplatValue and IsSplatVector · 50828c75

Simon Pilgrim authored Nov 18, 2018

Refactor towards making this recursive (necessary for PR38243 rotation splat detection).
IsSplatVector returns the original vector source of the splat and the splat index.
GetSplatValue returns the scalar splatted value as an extraction from IsSplatVector.

llvm-svn: 347168

50828c75

[x86] regenerate full checks; NFC · bc23408f
Sanjay Patel authored Nov 18, 2018
```
llvm-svn: 347167
```
bc23408f
[SystemZ] make test immune to improvements in undef simplification · 7e659ef4
Sanjay Patel authored Nov 18, 2018
```
llvm-svn: 347166
```
7e659ef4
[Hexagon] make tests immune to improvements in undef simplification · cb04e590
Sanjay Patel authored Nov 18, 2018
```
llvm-svn: 347165
```
cb04e590
[ARM] make test immune to improvements in undef simplification · becf03ef
Sanjay Patel authored Nov 18, 2018
```
llvm-svn: 347164
```
becf03ef
[X86][SSE] Relax IsSplatValue - remove the 'variable shift' limit on subtracts. · fec9f865
Simon Pilgrim authored Nov 18, 2018
```
Means we don't use the per-lane-shifts as much when we can cheaply use the older splat-variable-shifts.

llvm-svn: 347162
```
fec9f865
[x86] make tests immune to improvements in undef handling · 40509997
Sanjay Patel authored Nov 18, 2018
```
llvm-svn: 347161
```
40509997
[SelectionDAG] simplify code; NFC · 42c22a1f
Sanjay Patel authored Nov 18, 2018
```
llvm-svn: 347160
```
42c22a1f
[X86][SSE] Add some generic masked gather codegen tests · 7fdbae32
Simon Pilgrim authored Nov 18, 2018
```
llvm-svn: 347159
```
7fdbae32

[X86][SSE] Use raw shuffle mask decode in SimplifyDemandedVectorEltsForTargetNode (PR39549) · cc1f5d24

Simon Pilgrim authored Nov 18, 2018

We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs.

llvm-svn: 347158

cc1f5d24

Swap order of discovering of -ltinfo and -lterminfo · 83aabf43

Kamil Rytarowski authored Nov 18, 2018

Summary:
NetBSD ships with native curses(3) and -ltinfo is a part of ncurses.
Set -lterminfo before -ltinfo, as it allows to prioritize native curses
libraries. Mixing curses and ncurses does not work well, especially
in software built on top of llvm.

Original patch by Ryo Onodera (NetBSD) in pkgsrc.

Reviewers: labath, dim, mgorny

Reviewed By: dim, mgorny

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54650

llvm-svn: 347156

83aabf43

[WebAssembly] Add null streamer support · e0f8b9bf

Heejin Ahn authored Nov 18, 2018

Summary: Now `llc -filetype=null` works.

Reviewers: eush

Subscribers: dschuff, jgravelle-google, sbc100, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54660

llvm-svn: 347155

e0f8b9bf

[WebAssembly] Add equality comparison operators for WasmEventType · 7a391ff9

Heejin Ahn authored Nov 18, 2018

Summary:
This was missing in D54096. Independent tests for this is not available
here, because these are used in lld.

Reviewers: sbc100

Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54662

llvm-svn: 347154

7a391ff9

[X86] Add -x86-experimental-vector-widening-legalization check to... · cd94a7c2

Craig Topper authored Nov 18, 2018

[X86] Add -x86-experimental-vector-widening-legalization check to combineSelect and combineSetCC to cover vXi16/vXi8 promotion without BWI.

I don't yet have any test cases for this, but its the right thing to do based on log file inspection.

llvm-svn: 347151

cd94a7c2

[X86] Rename WidenMaskArithmetic->PromoteMaskArithmetic since we usually use... · b03f80a2

Craig Topper authored Nov 18, 2018

[X86] Rename WidenMaskArithmetic->PromoteMaskArithmetic since we usually use widen to refer to adding elements not making elements larger. NFC

llvm-svn: 347150

b03f80a2

[X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends... · f56a5751

Craig Topper authored Nov 18, 2018

[X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends from i8 or smaller without SSE4.1. Prefer to shrink the mul instead.

The zero extend will require two stages of unpacks to implement. So its better to shrink the multiply using pmullw and then extend that result back to v4i32 using a single unpack.

llvm-svn: 347149

f56a5751

tighten up a couple of assertions. hitting the BitPosition == BitWidth case... · ab778149

John Regehr authored Nov 18, 2018

tighten up a couple of assertions. hitting the BitPosition == BitWidth case that was previously not caught resulted in nasty corruption of APInts that (on my system at least) could not be detected using UBSan, ASan, or Valgrind. this patch does not cause any extra failures in a check-all nor does it interfere with bootstrapping. David Blaikie informally approved this change.

llvm-svn: 347148

ab778149

Vedant Kumar authored Nov 18, 2018

Fix all of the missing debug location errors in CVP found by debugify.

This includes the missing-location-after-udiv-truncation case described
in llvm.org/PR38178.

llvm-svn: 347147

35f504c1

Nov 17, 2018

Fix bot failure from r347145 · 5b9bb25c

Teresa Johnson authored Nov 17, 2018

The #if check around the statistics computation gave an error about
the statistic being an unused variable. Instead, guard with
AreStatisticsEnabled().

llvm-svn: 347146

5b9bb25c

[ThinLTO] Add some stats for read only variable internalization · 8c1915cc

Teresa Johnson authored Nov 17, 2018

Summary:
Follow up to D49362 ([ThinLTO] Internalize read only globals). Add a
statistic on the number of read only variables (only counting live
variables since dead variables will be dropped anyway).

Reviewers: evgeny777

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54642

llvm-svn: 347145

8c1915cc

[X86] Add support for matching PACKUSWB from a v64i8 shuffle. · 0438d791
Craig Topper authored Nov 17, 2018
```
llvm-svn: 347143
```
0438d791
[X86] Add test case to show missed opportunity to use PACKUSWB in v64i8 shuffle lowering. · c6c760f0
Craig Topper authored Nov 17, 2018
```
llvm-svn: 347142
```
c6c760f0
Move BuryPointer from Clang to LLVM for use in other LLVM tools · ef543381
David Blaikie authored Nov 17, 2018
```
Specifically planning to use this in llvm-symbolizer to remove the cost
of cleanup there.

llvm-svn: 347140
```
ef543381
[X86][SSE] Add shuffle demanded elts test case for PR39549 · 0e1a9d5e
Simon Pilgrim authored Nov 17, 2018
```
llvm-svn: 347139
```
0e1a9d5e

[llvm-objdump] Print a blank row at the end of sections · 785edea9

Xing GUO authored Nov 17, 2018

Summary:
When using option `-x` (--all-headers), it will print `Sections`, `Symbol Table`, `Program Header` ...
`Sections` and `Symbol Table` will be connected together.

Before:
```
Sections:
Idx Name          Size      Address          Type
  0               00000000 0000000000000000
  ...
  29 .shstrtab     0000011a 0000000000000000
SYMBOL TABLE:
  ...
```

After:
```
Sections:
Idx Name          Size      Address          Type
  0               00000000 0000000000000000
  ...
  29 .shstrtab     0000011a 0000000000000000

SYMBOL TABLE:
  ...
```

Reviewers: Higuoxing

Reviewed By: Higuoxing

Subscribers: llvm-commits, jhenderson

Differential Revision: https://reviews.llvm.org/D54665

llvm-svn: 347135

785edea9

llvm-symbolizer: Avoid calling getFromOffset when the index entry is already available · 81959a27

David Blaikie authored Nov 17, 2018

Especially for symbolizer it can be efficient to have to search through
the entire index when it isn't needed - llvm-symbolizer looks up only a
few CUs & already has an index available in getUnitForEntry, once it's
passed down to DWARFUnitHeader::extract then there's no need for it to
call getFromOffset.

llvm-svn: 347134

81959a27

[X86] Don't extend v32i8 multiplies to v32i16 with avx512bw and prefer-vector-width=256. · dd61f116
Craig Topper authored Nov 17, 2018
```
llvm-svn: 347131
```
dd61f116

[X86] Add test cases to show incorrect use of a 512 bit vector in v32i8... · d8da95bb

Craig Topper authored Nov 17, 2018

[X86] Add test cases to show incorrect use of a 512 bit vector in v32i8 multiply lowering with prefer-vector-width=256.

On the min-legal-vector-width test this actually causes some of the v32i16 operations we emitted to be scalarized.

llvm-svn: 347130

d8da95bb

Reverted r347092 due to the following build fails: · 6a5d5ac4

Vyacheslav Zakharin authored Nov 17, 2018

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/8662
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/26263

llvm-svn: 347129

6a5d5ac4

Add initial scaffolding for the GN build. · f94d6ea9

Nico Weber authored Nov 17, 2018

See "GN build roundtable summary; adding GN build files to the repo" on
llvm-dev and cfe-dev for discussion.

In particular, this build is completely unsupported. People adding new files to
LLVM are not expected to update the GN build files, and reviewers are not
supposed to request the gn build files to be updated.

This adds just enough to be able to build llvm/lib/Demangle. It requires using
a monorepo.

This adds a few build config options you can set in args.gn
(`gn args out/foo --list` for all):
- is_debug = true to enable debug builds (defaults to release)
- llvm_enable_assertions to toggle assertions (defaults to true)
- clang_base_path, if set an absolute path to a locally-built clang to be used
  as host compiler

Differential Revision: https://reviews.llvm.org/D54345

llvm-svn: 347128

f94d6ea9