Commits · edb12a838a22f212d7bee970ff637192f7ea8576 · Lorenzo Albano / LLVM bpEVL

Oct 15, 2018

[TI removal] Make variables declared as `TerminatorInst` and initialized · edb12a83

Chandler Carruth authored Oct 15, 2018

by `getTerminator()` calls instead be declared as `Instruction`.

This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).

llvm-svn: 344502

edb12a83

[TI removal] Remove `TerminatorInst` from GVN.h and GVN.cpp. · ae98759e

Chandler Carruth authored Oct 15, 2018

This is the last interesting usage in all of LLVM's headers. The
remaining usages in headers are the core typesystem bits (Core.h,
instruction types, and InstVisitor) and as the return of
`BasicBlock::getTerminator`. The latter is the big remaining API point
that I'll remove after mass updates to user code.

llvm-svn: 344501

ae98759e

[TI removal] Remove `TerminatorInst` from BasicBlockUtils.h · 4a2d58e1

Chandler Carruth authored Oct 15, 2018

This requires updating a number of .cpp files to adapt to the new API.
I've just systematically updated all uses of `TerminatorInst` within
these files te `Instruction` so thta I won't have to touch them again in
the future.

llvm-svn: 344498

4a2d58e1

[TI removal] Remove TerminatorInst as an input parameter from all public · b99a2468

Chandler Carruth authored Oct 15, 2018

LLVM APIs. There weren't very many.

We still have the instruction visitor, and APIs with TerminatorInst as
a return type or an output parameter.

llvm-svn: 344494

b99a2468

[TwoAddressInstructionPass] Replace subregister uses when processing tied operands · 06494435

Bjorn Pettersson authored Oct 15, 2018

Summary:
TwoAddressInstruction pass typically rewrites
  %1:short = foo %0.sub_lo:long
as
  %1:short = COPY %0.sub_lo:long
  %1:short = foo %1:short
when having tied operands.

If there are extra un-tied operands that uses the same reg and
subreg, such as the second and third inputs to fie here:
  %1:short = fie %0.sub_lo:long, %0.sub_hi:long, %0.sub_lo:long
then there was a bug which replaced the register %0 also for
the un-tied operand, but without changing the subregister indices.
So we used to get:
  %1:short = COPY %0.sub_lo:long
  %1:short = fie %1, %1.sub_hi:short, %1.sub_lo:short
With this fix we instead get:
  %1:short = COPY %0.sub_lo:long
  %1:short = fie %1, %0.sub_hi:long, %1

Reviewers: arsenm, JesperAntonsson, kparzysz, MatzeB

Reviewed By: MatzeB

Subscribers: bjope, kparzysz, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D36224

llvm-svn: 344492

06494435

[ORC] Simplify naming for JITDylib definition generators. · a5157d6f

Lang Hames authored Oct 15, 2018

Renames:
  JITDylib's setFallbackDefinitionGenerator method to setGenerator.
  DynamicLibraryFallbackGenerator class to DynamicLibrarySearchGenerator.
  ReexportsFallbackDefinitionGenerator to ReexportsGenerator.

llvm-svn: 344489

a5157d6f

[X86] Move promotion of vector and/or/xor from legalization to DAG combine · 06aea172

Craig Topper authored Oct 15, 2018

Summary:
I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that.

This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better.

In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53107

llvm-svn: 344487

06aea172

[X86] Add 128 MOVDDUP to the constant pool printing in X86AsmPrinter::EmitInstruction. · 67177945
Craig Topper authored Oct 15, 2018
```
We use this instruction to broadcast a single 64-bit value to a v2i64/v2f64 vector.

llvm-svn: 344486
```
67177945

Oct 14, 2018

[LV] Fix comments reported when not vectorizing single iteration loops; NFC · e567b5b5

Ayal Zaks authored Oct 14, 2018

Landing this as a separate part of https://reviews.llvm.org/D50480, being a
seemingly unrelated change ([LV] Vectorizing loops of arbitrary trip count
without remainder under opt for size).

llvm-svn: 344483

e567b5b5

[X86][AVX] Enable lowerVectorShuffleAsLanePermuteAndPermute v16i16/v32i8 shuffle lowering · 861cd0ba
Simon Pilgrim authored Oct 14, 2018
```
Extends D53148 from v4f64 now that we have test coverage for v16i16/v32i8 shuffles.

llvm-svn: 344481
```
861cd0ba

[LegalizeDAG] Don't bother with final MUL+SRL stage for byte CTPOP. · a0590a4f

Simon Pilgrim authored Oct 14, 2018

The final stage of CTPOP expansion (v = (v * 0x01010101...) >> (Len - 8)) is completely pointless for the byte (Len = 8) case as it reduces to (v = (v * 0x01...) >> 0), but annoyingly this doesn't always get optimized away.

Found while investigating generic vector CTPOP expansion (PR32655).

llvm-svn: 344477

a0590a4f

[InstCombine] combine a shuffle and an extract subvector shuffle · 7181146c

Sanjay Patel authored Oct 14, 2018

This is part of the missing IR-level folding noted in D52912.
This should be ok as a canonicalization because the new shuffle mask can't
be any more complicated than the existing shuffle mask. If there's some
target where the shorter vector shuffle is not legal, it should just end up
expanding to something like the pair of shuffles that we're starting with here.

Differential Revision: https://reviews.llvm.org/D53037

llvm-svn: 344476

7181146c

recommit 344472 after fixing build failure on ARM and PPC. · 38bbf81a
Dorit Nuzman authored Oct 14, 2018
```
llvm-svn: 344475
```
38bbf81a
revert 344472 due to failures. · 5118c68c
Dorit Nuzman authored Oct 14, 2018
```
llvm-svn: 344473
```
5118c68c

[IAI,LV] Add support for vectorizing predicated strided accesses using masked · 81743689

Dorit Nuzman authored Oct 14, 2018

interleave-group

The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.

Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53011

llvm-svn: 344472

81743689

[X86] Fix bad indentation. NFC · 20fa085d
Craig Topper authored Oct 14, 2018
```
llvm-svn: 344471
```
20fa085d

[X86] Type legalize v2f32 stores by widening to v4f32, casting to v2f64,... · ec4b75f4

Craig Topper authored Oct 14, 2018

[X86] Type legalize v2f32 stores by widening to v4f32, casting to v2f64, extracting f64 and storing.

Summary: This is similar to what D52528 did for loads. It should match what generic type legalization does in 64-bit mode where it uses a v2i64 cast and an i64 store.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53173

llvm-svn: 344470

ec4b75f4

Move some helpers from the global namespace into anonymous ones. · c55e9975
Benjamin Kramer authored Oct 13, 2018
```
llvm-svn: 344468
```
c55e9975

Oct 13, 2018

[ORC] During lookup, do not match against hidden symbols in other JITDylibs. · 7899ccbc

Lang Hames authored Oct 13, 2018

This adds two arguments to the main ExecutionSession::lookup method:
MatchNonExportedInJD, and MatchNonExported. These control whether and where
hidden symbols should be matched when searching a list of JITDylibs.

A similar effect could have been achieved by filtering search results, but
this would have involved materializing symbol definitions (since materialization
is triggered on lookup) only to throw the results away, among other issues.

llvm-svn: 344467

7899ccbc

Pull out repeated variables from SelectionDAGLegalize::ExpandBitCount. · 28a143f7

Simon Pilgrim authored Oct 13, 2018

The CTPOP case has been changed from VT.getSizeInBits to VT.getScalarSizeInBits - but this fits in with future work for vector support (PR32655) and doesn't affect any current (scalar) uses.

llvm-svn: 344461

28a143f7

[LegalizeTypes] Prevent an assertion from PromoteIntRes_BSWAP and... · 189e5b4a

Craig Topper authored Oct 13, 2018

[LegalizeTypes] Prevent an assertion from PromoteIntRes_BSWAP and PromoteIntRes_BITREVERSE if the shift amount is too large for the VT returned by getShiftAmountTy

Summary:
getShiftAmountTy for X86 returns MVT::i8. If a BSWAP or BITREVERSE is created that requires promotion and the difference between the original VT and the promoted VT is more than 255 then we won't able to create the constant.

This patch adds a check to replace the result from getShiftAmountTy to MVT::i32 if the difference won't fit. This should get legalized later when the shift is ultimately expanded since its clearly an illegal type that we're only promoting to make it a power of 2 bit width. Alternatively we could base the decision completely on the largest shift amount the promoted VT could use.

Vectors should be immune here because getShiftAmountTy always returns the incoming VT for vectors. Only the scalar shift amount can be changed by the targets.

Reviewers: eli.friedman, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53232

llvm-svn: 344460

189e5b4a

[WebAssembly][NFC] Fix signed/unsigned comparison warning · ffde98de
Thomas Lively authored Oct 13, 2018
```
llvm-svn: 344459
```
ffde98de

[InstCombine] fix complexity canonicalization with fake unary vector ops · 47579b21

Sanjay Patel authored Oct 13, 2018

This is a preliminary step to avoid regressions when we add
an actual 'fneg' instruction to IR. See D52934 and D53205.

llvm-svn: 344458

47579b21

[X86][SSE] Remove most of vector CTTZ custom lowering and use LegalizeDAG instead. · c5d7c6e5

Simon Pilgrim authored Oct 13, 2018

There is one remnant - AVX1 custom splitting of 256-bit vectors - which is due to a regression where the X86ISD::ANDNP is still performed as a YMM.

I've also tightened the CTLZ or CTPOP lowering in SelectionDAGLegalize::ExpandBitCount to require a legal CTLZ - it doesn't affect existing users and fixes an issue with AVX512 codegen.

llvm-svn: 344457

c5d7c6e5

[InstCombine] Fixed crash with aliased functions · e8b3bba7

David Bolvansky authored Oct 13, 2018

Summary: Fixes PR39177

Reviewers: spatel, jbuening

Reviewed By: jbuening

Subscribers: jbuening, llvm-commits

Differential Revision: https://reviews.llvm.org/D53129

llvm-svn: 344454

e8b3bba7

[X86][SSE] Begin removing vector CTTZ custom lowering and use LegalizeDAG instead. · 1c2051ea
Simon Pilgrim authored Oct 13, 2018
```
Adds CTTZ vector legalization support and begins the removal of the X86/SSE custom lowering. 

llvm-svn: 344453
```
1c2051ea

[X86][SSE] combineIncDecVector - use isConstantSplat · 1c6d3203

Simon Pilgrim authored Oct 13, 2018

Use isConstantSplat instead of ISD::isConstantSplatVector to let us us peek through to illegal types (in this case for i686 targets to recognise i64 constants)

llvm-svn: 344452

1c6d3203

[X86] Pull out target constant splat helper function. NFCI. · a0337952

Simon Pilgrim authored Oct 13, 2018

The code in LowerScalarImmediateShift is just a more powerful version of ISD::isConstantSplatVector.

llvm-svn: 344451

a0337952

Pull out repeated getOperand(). NFCI. · 10434cba
Simon Pilgrim authored Oct 13, 2018
```
llvm-svn: 344450
```
10434cba
Remove unused variable. NFCI. · bc141724
Simon Pilgrim authored Oct 13, 2018
```
llvm-svn: 344449
```
bc141724

[X86][SSE] Improve CTTZ lowering when CTLZ is legal · f64e654d

Simon Pilgrim authored Oct 13, 2018

If we have better CTLZ support than CTPOP, then use cttz(x) = width - ctlz(~x & (x - 1)) - and remove the CTTZ_ZERO_UNDEF handling as it no longer gives better codegen.

Similar to rL344447, this is also closer to LegalizeDAG's approach

llvm-svn: 344448

f64e654d

[X86][SSE] Change CTTZ vector lowering to cttz(x) = ctpop(~x & (x - 1)) · afead139

Simon Pilgrim authored Oct 13, 2018

This patch changes the vector CTTZ lowering from:

cttz(x) = ctpop((x & -x) - 1)

to:

cttz(x) = ctpop(~x & (x - 1))

Not only does this make better use of the PANDN instruction, but it also matches the LegalizeDAG method which should allow us to remove the x86 specific code at some point in the future (we need to fix some issues with the bitcasted logic ops and CTPOP lowering first).

Differential Revision: https://reviews.llvm.org/D53214

llvm-svn: 344447

afead139

[X86][AVX] Add lowerVectorShuffleAsLanePermuteAndPermute for v4f64 shuffles (PR39161) · f3952413

Simon Pilgrim authored Oct 13, 2018

Add shuffle lowering for the case where we can shuffle the lanes into place followed by an in-lane permute.

This is mainly for cases where we can have non-repeating permutes in each lane, but for now I've just enabled it for v4f64 unary shuffles to fix PR39161 - there is no test coverage for other shuffles that might benefit yet.

We now have several cross-lane shuffle lowering methods that all do something similar - I've looked at merging some of these (notably by making the repeated mask mechanism in lowerVectorShuffleByMerging128BitLanes optional), but there is a lot of assertions/assumptions in the way that makes this tricky - I ended up going for adding yet another relatively simple method instead.

Differential Revision: https://reviews.llvm.org/D53148

llvm-svn: 344446

f3952413

[AArch64] Swap comparison operands if that enables some folding. · 162435e7

Arnaud A. de Grandmaison authored Oct 13, 2018

Summary:
AArch64 can fold some shift+extend operations on the RHS operand of
comparisons, so swap the operands if that makes sense.

This provides a fix for https://bugs.llvm.org/show_bug.cgi?id=38751

Reviewers: efriedma, t.p.northover, javed.absar

Subscribers: mcrosier, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D53067

llvm-svn: 344439

162435e7

[WebAssembly] SIMD min and max · 3afc346d

Thomas Lively authored Oct 13, 2018

Summary: Depends on D52324 and D52764.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52325

llvm-svn: 344438

3afc346d

[Intrinsic] Add llvm.minimum and llvm.maximum instrinsic functions · 16c349d8

Thomas Lively authored Oct 13, 2018

Summary:
These new intrinsics have the semantics of the `minimum` and `maximum`
operations specified by the latest draft of IEEE 754-2018. Unlike
llvm.minnum and llvm.maxnum, these new intrinsics propagate NaNs and
always treat -0.0 as less than 0.0. `minimum` and `maximum` lower
directly to the existing `fminnan` and `fmaxnan` ISel DAG nodes. It is
safe to reuse these DAG nodes because before this patch were only
emitted in situations where there were known to be no NaN arguments or
where NaN propagation was correct and there were known to be no zero
arguments. I know of only four backends that lower fminnan and
fmaxnan: WebAssembly, ARM, AArch64, and SystemZ, and each of these
lowers fminnan and fmaxnan to instructions that are compatible with
the IEEE 754-2018 semantics.

Reviewers: aheejin, dschuff, sunfish, javed.absar

Subscribers: kristof.beyls, dexonsmith, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D52764

llvm-svn: 344437

16c349d8

[WebAssembly][NFC] Unify ARGUMENT classes · 0ff82ac1

Thomas Lively authored Oct 13, 2018

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53172

llvm-svn: 344436

0ff82ac1

move GetOrCreateFunctionComdat to Instrumentation.cpp/Instrumentation.h · bc504559

Kostya Serebryany authored Oct 12, 2018

Summary:
GetOrCreateFunctionComdat is currently used in SanitizerCoverage,
where it's defined. I'm planing to use it in HWASAN as well,
so moving it into a common location.
NFC

Reviewers: morehouse

Reviewed By: morehouse

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53218

llvm-svn: 344433

bc504559

[RISCV] Eliminate unnecessary masking of promoted shift amounts · 748d080e

Alex Bradbury authored Oct 12, 2018

SelectionDAGBuilder::visitShift will always zero-extend a shift amount when it
is promoted to the ShiftAmountTy. This results in zero-extension (masking)
which is unnecessary for RISC-V as the shift operations only read the lower 5
or 6 bits (RV32 or RV64).

I initially proposed adding a getExtendForShiftAmount hook so the shift amount
can be any-extended (D52975). @efriedma explained this was unsafe, so I have
instead eliminate the unnecessary and operations at instruction selection time
in a manner similar to X86InstrCompiler.td.

Differential Revision: https://reviews.llvm.org/D53224

llvm-svn: 344432

748d080e

[LegalizeVectorTypes] Use TLI.getVectorIdxTy instead of DAG.getIntPtrConstant. · a7965809
Craig Topper authored Oct 12, 2018
```
There's no guarantee that vector indices should use pointer types. So use the correct query method.

llvm-svn: 344428
```
a7965809