Commits · 20d67ffeae80ad0e26ad57b0aebf60a2905fe39a · Lorenzo Albano / LLVM bpEVL

May 07, 2020

Revert "[LV] Induction Variable does not remain scalar under tail-folding." · 20d67ffe
Sjoerd Meijer authored May 07, 2020
```
This reverts commit 617aa64c.

while I investigate buildbot failures.
```
20d67ffe

[LV] Induction Variable does not remain scalar under tail-folding. · 617aa64c

Sjoerd Meijer authored May 07, 2020

If tail-folding of the scalar remainder loop is applied, the primary induction
variable is splat to a vector and used by the masked load/store vector
instructions, thus the IV does not remain scalar. Because we now mark
that the IV does not remain scalar for these cases, we don't emit the vector IV
if it is not used. Thus, the vectoriser produces less dead code.

Thanks to Ayal Zaks for the direction how to fix this.

Differential Revision: https://reviews.llvm.org/D78911

617aa64c

[NFC][ARM] Add tail predication test · 3c9b6dfa
Sam Parker authored May 06, 2020

3c9b6dfa

[SVE] Fix getAlignmentInfo for scalable vectors · a400aa5f

David Sherwood authored May 05, 2020

When calculating the natural alignment for scalable vectors it
is acceptable to calculate an allocation size based on the minimum
number of elements in the vector.

This code path is exercised by an existing test:

  CodeGen/AArch64/sve-intrinsics-int-arith.ll

Differential Revision: https://reviews.llvm.org/D79475

a400aa5f

[X86] Enable combinePMULH to match multiplies with elements larger than i32. · 35064559
Craig Topper authored May 06, 2020
```
We're truncating so the extra bits will be discarded.
```
35064559

[X86] Add test cases for missed opportunity to match pmulh from multiplies... · 1796cfd8

Craig Topper authored May 06, 2020

[X86] Add test cases for missed opportunity to match pmulh from multiplies with elements larger than i32.

We currently look for vXi32 sext/zext to match PMULH, but it
doesn't matter how many extra bits above i32 there are.

1796cfd8

[DWARFLinker] Fix llvm::sort ambiguity · c7b499d8
Jonas Devlieghere authored May 06, 2020
```
Fix DWARFLinker.cpp:2538:5: error: call to 'sort' is ambiguous.
```
c7b499d8
[SelectionDAG] When splitting gather operands in type legalization, set MMO size to UnknownSize · 7b9d6673
Craig Topper authored May 06, 2020
```
I missed this case when I did the same for gather results and scatter
operands in c69a4d6b.
```
7b9d6673

[dsymutil] Add option to print statistics about the .debug_info size. · 7fb9bcd3

Jonas Devlieghere authored May 06, 2020

This patch adds statistics about the contribution of each object file to
the linked debug info. When --statistics is passed to dsymutil, it
prints a table after linking as illustrated below.

It lists the object file name, the size of the debug info in the object
file in bytes, and the absolute size contribution to the linked dSYM and
the percentage difference. The table is sorted by the output size, so
the object files contributing the most to the link are listed first.

.debug_info section size (in bytes)
-------------------------------------------------------------------------------
Filename                                           Object         dSYM   Change
-------------------------------------------------------------------------------
basic2.macho.x86_64.o                                210b         165b  -24.00%
basic3.macho.x86_64.o                                177b         150b  -16.51%
basic1.macho.x86_64.o                                125b         129b    3.15%
-------------------------------------------------------------------------------
Total                                                512b         444b  -14.23%
-------------------------------------------------------------------------------

Differential revision: https://reviews.llvm.org/D79513

7fb9bcd3

[AArch64][SVE] Implement lowering for SIGN_EXTEND etc. of SVE predicates. · 2c854610

Eli Friedman authored Apr 29, 2020

Now using patterns, since there's a single-instruction lowering. (We
could convert to VSELECT and pattern-match that, but there doesn't seem
to be much point.)

I think this might be the first instruction to use nested multiclasses
this way? It seems like a good way to reduce duplication between
different integer widths. Let me know if it seems like an improvement.

Also, while I'm here, fix the return type of SETCC so we don't try to
merge a sign-extend with a SETCC.

Differential Revision: https://reviews.llvm.org/D79193

2c854610

[COFF] Dump string table size for COFF file headers · d71c3c42
Reid Kleckner authored May 06, 2020
```
I couldn't find this info in any other dumper, so it might as well be
here.
```
d71c3c42

May 06, 2020

[X86] Remove support for Y0 constraint as an alias for Yz in inline assembly. · 16c800b8

Craig Topper authored May 06, 2020

Neither gcc or icc support this. Split out from D79472. I want
to remove more, but it looks like icc does support some things
gcc doesn't and I need to double check our internal test suites.

16c800b8

[LoopUnrollAndJam] Changed safety checks to consider more than 2-levels · 0a52401a

Whitney Tsang authored May 06, 2020

loop nest.

Summary: As discussed in https://reviews.llvm.org/D73129.

Example
Before unroll and jam:

for
  A
  for
    B
    for
      C
    D
  E
After unroll and jam (currently):

for
  A
  A'
  for
    B
    for
      C
    D
    B'
    for
      C'
    D'
  E
  E'
After unroll and jam (Ideal):

for
  A
  A'
  for
    B
    B'
    for
      C
      C'
    D
    D'
  E
  E'
This is the first patch to change unroll and jam to work in the ideal
way.
This patch change the safety checks needed to make sure is safe to
unroll and jam in the ideal way.

Reviewer: dmgreen, jdoerfert, Meinersbur, kbarton, bmahjour, etiotto
Reviewed By: Meinersbur
Subscribers: fhahn, hiraditya, zzheng, llvm-commits, anhtuyen, prithayan
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D76132

0a52401a

[X86] Remove incomplete support for 'Y' has an inline assembly constraint by itself. · 9bb9ff09

Craig Topper authored May 06, 2020

Y is the start of several 2 letter constraints, but we also had
partial support to recognize it by itself. But it doesn't look
like it can get through clang as a single letter so the backend
support for this was effectively dead.

9bb9ff09

Revert "[Debug][CodeView] Emit fully qualified names for globals" · f78b674d
Alexandre Ganea authored May 06, 2020
```
This reverts commit 06591b6d.
```
f78b674d

[SystemZ] Fix/optimize vec_load_len and related intrinsics · 947f78ac

Ulrich Weigand authored May 06, 2020

When using vec_load/store_len_r with an immediate length operand
of 16 or larger, LLVM will currently emit an VLRL/VSTRL instruction
with that immediate.  This creates a valid encoding (which should be
supported by the assembler), but always traps at runtime.  This patch
fixes this by not creating VLRL/VSTRL in those cases.

This would result in loading the length into a register and
calling VLRLR/VSTRLR instead.  However, these operations with
a length of 15 or larger are in fact simply equivalent to a
full vector load or store.  And in fact the same holds true for
vec_load/store_len as well.

Therefore, add a DAGCombine rule to replace those operations with
plain vector loads or stores if the length is known at compile
time and equal or larger to 15.

947f78ac

[SelectionDAG] Fix assertion failure with big shift amounts · 7fa5abd3

LemonBoy authored May 06, 2020

Calling getShiftAmountTy with LegalTypes set may return a type that's too narrow to hold the shift amount for integer type it's applied to.

Fixes the regression introduced by D79096

Differential Revision: https://reviews.llvm.org/D79405

7fa5abd3

[x86] add test of shift+cast+concat for PR45794; NFC · 1b678ee8
Sanjay Patel authored May 06, 2020
```
Depends on D79360 / rG2f1fe1864d25 for the transform.
```
1b678ee8

[amdgpu] Fix check of VCC. · 4ee5a041

Michael Liao authored May 06, 2020

Summary: - Need to include checking on the new 16-bit subregs.

Reviewers: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79498

4ee5a041

[X86][AVX] Add PR45808 test case for badly promoted comparison mask arithmetic · fe6f5ba0
Simon Pilgrim authored May 06, 2020

fe6f5ba0
Revert "Mark values as trivially dead when their only use is a start or end lifetime intrinsic." · 1998e796
zoecarver authored May 06, 2020
```
This reverts commit 95aa28cc.
```
1998e796

Mark values as trivially dead when their only use is a start or end lifetime intrinsic. · 95aa28cc

zoecarver authored May 06, 2020

Summary:
If the only use of a value is a start or end lifetime intrinsic then mark the intrinsic as trivially dead. This should allow for that value to then be removed as well.

Currently, this only works for allocas, globals, and arguments.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79355

95aa28cc

[X86] getShuffleScalarElt - add CONCAT_VECTORS/INSERT_VECTOR_ELT support. · 8817334c

Simon Pilgrim authored May 06, 2020

This helped fix some i686 vXi64 broadcast folds that were becoming v2Xi32 broadcasts because we didn't match the broadcast until after SimplifyDemandedBits worked out we only used the bottom 32-bits in PMUL(U)DQ and type legalization had split the original i64 load.

A couple of regressions occurred which required some fixups - adding concat_vectors(broadcast_load,broadcast_load) splat support and recognising (unnecessary) unary shuffles of already broadcasted vectors.

This came about as part of the work investigating vector load combining from shuffles for PR42550.

8817334c

[X86] getShuffleScalarElt - consistently use SDValue. NFC. · 8c71c229

Simon Pilgrim authored May 06, 2020

We never need to call this from anything but ISD::SHUFFLE_VECTOR or target shuffles so shouldn't need to address SDNode directly.

8c71c229

[InstCombine] limit bitcast+insertelement transform to x86 MMX type · 2058c987

Sanjay Patel authored May 06, 2020

This is unusual for the general case because we are replacing
1 instruction with 2.

Splitting from a potential conflicting transform in D79171

2058c987

[SVE] Fix invalid uses of VectorType::getNumElements() in ValueTracking · 782231ac

Christopher Tetreault authored May 06, 2020

Summary:
Any function in this module that make use of DemandedElts laregely does
not work with scalable vectors. DemandedElts is used to define which
elements of the vector to look at. At best, for scalable vectors, we can
express the first N elements of the vector. However, in practice, most
code that uses these functions expect to be able to talk about the
entire vector. In principle, this module should be able to be extended
to work with scalable vectors. However, before we can do that, we should
ensure that it does not cause code with scalable vectors to miscompile.
All functions that use a DemandedElts will bail out if the vector is
scalable. Usages of getNumElements() are updated to go through
FixedVectorType pointers.

Reviewers: rengolin, efriedma, sdesmalen, c-rhodes, spatel

Reviewed By: efriedma

Subscribers: david-arm, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79053

782231ac

InstCombine: Fix return after else · 59bc99a0
Matt Arsenault authored May 05, 2020

59bc99a0
Revert "[MIR] Fix a bug in MIR printer." · 6533c1da
Michael Liao authored May 06, 2020
```
This reverts commit e38018b8.
```
6533c1da

[AMDGPU] Drop 16 bit subreg suffixes on print · 54d6dfe9

Stanislav Mekhanoshin authored May 05, 2020

We do not want to break asm syntax. These suffixes are
quite useful for debugging, so add an option to print
them. Right now it is NFC.

Differential Revision: https://reviews.llvm.org/D79435

54d6dfe9

[AMDGPU] Don't implement GCNHazardRecognizer::PreEmitNoops(SUnit *) · 29067aac

Jay Foad authored May 06, 2020

When called from the post-RA scheduler, hazards have already been
handled by getHazardType returning NoopHazard, so PreEmitNoops always
returns zero. Remove it. NFC.

Historical note: PreEmitNoops was added to the hazard recognizer
interface as an optional feature to support dispatch group formation on
the POWER target:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20131202/197470.html
So it seems right that we shouldn't need to implement it.

We do still implement the other overload PreEmitNoops(MachineInstr *)
because that is used by the PostRAHazardRecognizer pass.

Differential Revision: https://reviews.llvm.org/D79476

29067aac

[RISCV][NFC] Add more constant materialization tests · a3e6e624

Luís Marques authored May 06, 2020

This patch adds more constant materialization tests, focusing on cases where
we could improve our materialization instruction sequences (particularly for
RV64). Various of these cases will be improved upon in follow-up patches.

Differential Revision: https://reviews.llvm.org/D79453

a3e6e624

[ARM] VMOVhr load -> vldr · f5f83cf4

David Green authored May 06, 2020

Much like the similar combine added recently for VMOVrh load, this
adds a fold for VMOVhr load turning it into a vldr.f16 as opposed to a
vldrh and vmov.f16.

Differential Revision: https://reviews.llvm.org/D78714

f5f83cf4

[MIR] Fix a bug in MIR printer. · e38018b8

Michael Liao authored May 06, 2020

- Need to skip the assignment of `ID`, which is used to index that two
  object arrays.

e38018b8

For PAL, make sure Scratch Buffer Descriptor do not clobber GIT pointer · f7060f4f

Ram Nalamothu authored Apr 30, 2020

Since SRSRC has alignment requirements, first find non GIT pointer clobbered
registers for SRSRC and then if those registers clobber preloaded Scratch Wave
Offset register, copy the Scratch Wave Offset register to a free SGPR.

f7060f4f

[DAGCombiner] sink target-supported FP<->int cast op after concat vectors · 2f1fe186

Sanjay Patel authored May 06, 2020

Try to combine N short vector cast ops into 1 wide vector cast op:
concat (cast X), (cast Y)... -> cast (concat X, Y...)

This is part of solving PR45794:
https://bugs.llvm.org/show_bug.cgi?id=45794

As noted in the code comment, this is uglier than I was hoping because
the opcode determines whether we pass the source or destination type
to isOperationLegalOrCustom(). Also IIUC, there's no way to validate
what the other (dest or src) type is. Without the extra legality check
on that, there's an ARM regression test in:
test/CodeGen/ARM/isel-v8i32-crash.ll
...that will crash trying to lower an unsupported v8f32 to v8i16.

Differential Revision: https://reviews.llvm.org/D79360

2f1fe186

[X86][SSE] combineX86ShuffleChain - remove unused shuffle(vzext_load(),undef) combine. · f5f7fd99
Simon Pilgrim authored May 06, 2020
```
This should always be caught by the various VZEXT_MOVL handling in combineTargetShuffle and SimplifyDemandedVectorEltsForTargetNode.
```
f5f7fd99

AMDGPU: Insert kernarg code after allocas · 074c371a

Matt Arsenault authored May 04, 2020

This produces more normal looking IR by keeping all the allocas
clustered at the start of the block.

074c371a

[ARM] VMOVrh of VMOVhr · d05f8a38

David Green authored May 06, 2020

A VMOVhr of a VMOVrh can be simply folded to the original HPR value.

Differential Revision: https://reviews.llvm.org/D78710

d05f8a38

[VectorCombine] add tests for possible scalarization; NFC · e3eb297d
Sanjay Patel authored May 05, 2020

e3eb297d

[ARM] Extract from a VDUP · a349949f

David Green authored May 06, 2020

If we get into the situation where we are extracting from a VDUP, the
extracted value is just the origin, so long as the types match or we can
bitcast between the two.

Differential Revision: https://reviews.llvm.org/D78708

a349949f