Commits · 7bff9bdd34d53a660f80eb1cdc9da885fd2702e1 · Lorenzo Albano / LLVM bpEVL

May 12, 2021

[X86][AVX] combineConcatVectorOps - add ConcatSubOperand helper. NFCI. · 7bff9bdd
Simon Pilgrim authored May 12, 2021
```
Pull out repeated code to create a concat_vectors of the same operand from all subvecs.
```
7bff9bdd
[X86][AVX] Add v4i64 shift-by-32 tests · 778562ad
Simon Pilgrim authored May 12, 2021
```
AVX1 could perform this as a v8f32 shuffle instead of splitting - based off PR46621
```
778562ad

[TargetLowering] Improve legalization of scalable vector types · c5ec00e6

Fraser Cormack authored May 07, 2021

This patch extends the vector type-conversion and legalization capabilities of
scalable vector types.

Firstly, `vscale x 1` types now behave more like the corresponding `vscale x
2+` types. This enables the integer promotion legalization of extended scalable
types, such as the promotion of `<vscale x 1 x i5>` to `<vscale x 1 x i8>`.

These `vscale x 1` types are also now better handled by
`getVectorTypeBreakdown`, where what looks like older handling for 1-element
fixed-length vector types was spuriously updated to include scalable types.

Widening of scalable types is now better supported, by using `INSERT_SUBVECTOR`
to insert the smaller scalable vector "value" type into the wider scalable
vector "part" type. This allows AArch64 to pass and return `vscale x 1` types
by value by widening.

There are still cases where we are unable to legalize `vscale x 1` types, such
as where expansion would require splitting the vector in two.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D102073

c5ec00e6

[mlir][openacc] Conversion of data operand to LLVM IR dialect · 6110b667

Valentin Clement authored May 12, 2021

Add a conversion pass to convert higher-level type before translation.
This conversion extract meangingful information and pack it into a struct that
the translation (D101504) will be able to understand.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D102170

6110b667

[OpenCL] Remove pragma requirement from Arm dot extension. · 58d18dde

Anastasia Stulova authored May 12, 2021

This removed the pointless need for extension pragma since
it doesn't disable anything properly and it doesn't need to
enable anything that is not possible to disable.

The change doesn't break existing kernels since it allows to
compile more cases i.e. without pragma statements but the
pragma continues to be accepted.

Differential Revision: https://reviews.llvm.org/D100985

58d18dde

[llvm-cov][test] Add test coverage for "gcov" implying "llvm-cov gcov" compatibility. · 1336c5ae

Jordan Rupprecht authored May 11, 2021

Much like other LLVM binary utilities, `llvm-cov` has a symlink compatibility feature where it runs in `gcov` compatibility mode if the binary name ends in `gcov`. This is identical to invoking `llvm-cov gcov ...`.

Differential Revision: https://reviews.llvm.org/D102299

1336c5ae

[CUDA][HIP] Fix device template variables · 98575708

Yaxun (Sam) Liu authored May 11, 2021

Currently clang does not emit device template variables
instantiated only in host functions, however, nvcc is
able to do that:

https://godbolt.org/z/fneEfferY

This patch fixes this issue by refactoring and extending
the existing mechanism for emitting static device
var ODR-used by host only. Basically clang records
device variables ODR-used by host code and force
them to be emitted in device compilation. The existing
mechanism makes sure these device variables ODR-used
by host code are added to llvm.compiler-used, therefore
they are guaranteed not to be deleted.

It also fixes non-ODR-use of static device variable by host code
causing static device variable to be emitted and registered,
which should not.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D102237

98575708

[ValueTypes] Rename MVT::getVectorNumElements() to... · 44e0e91d

Craig Topper authored May 12, 2021

[ValueTypes] Rename MVT::getVectorNumElements() to MVT::getVectorMinNumElements(). Fix some misuses of getVectorNumElements()

getVectorNumElements() returns a value for scalable vectors
without any warning so it is effectively getVectorMinNumElements().
By renaming it and making getVectorNumElements() forward to
it, we can insert a check for scalable vectors into getVectorNumElements()
similar to EVT. I didn't do that in this patch because there are still more
fixes needed, but I was able to temporarily do it and passed the RISCV
lit tests with these changes.

The changes to isPow2VectorType and getPow2VectorType are copied from EVT.

The change to TypeInfer::EnforceSameNumElts reduces the size of AArch64's isel table.
We're now considering SameNumElts to require the scalable property to match which
removes some unneeded type checks.

This was motivated by the bug I fixed yesterday in 80b95108

Reviewed By: frasercrmck, sdesmalen

Differential Revision: https://reviews.llvm.org/D102262

44e0e91d

Revert "[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach... · 8d37411e

Stefan Pintilie authored May 12, 2021

Revert "[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics"

This reverts commit 6c80361b.
Breaks PowerPC Big Endian buildbots.

8d37411e

[DAGCombiner] Fix DAG combine store elimination, different address space. · 762ac725

Hendrik Greving authored May 11, 2021

Fixes a bug in the DAG combiner that eliminates the stores because it missed
to inspect the address space of the pointers.

%v = load %ptr_as1
// no chain side effect
store %v, %ptr_as2

As well as

store %v, %ptr_as1
store %v, %ptr_as2

Fixes a test for above in X86.

Differential Revision: https://reviews.llvm.org/D102096

762ac725

[DAGCombiner] Add test exposing bug in DAG combine. · 4b00ffa7

Hendrik Greving authored May 11, 2021

Adds a test in X86, exposing a bug in DAG combine eliminating stores that
are the same value but no the same address space.

Differential Revision: https://reviews.llvm.org/D102243

4b00ffa7

[CodeGen][AArch64][SVE] Fold [rdffr, ptest] => rdffrs; bugfix for optimizePTestInstr · 3fa6510f

Peter Waller authored Apr 27, 2021

When a ptest is used to set flags from the output of rdffr, the ptest
can be eliminated, using a flags-setting rdffrs instead.

Additionally, check that nothing consumes flags between rdffr and ptest;
this case appears to have been missed previously.

* There is no unpredicated RDFFRS instruction.
* If substituting RDFFR_PP, require that the mask argument of the
  PTEST matches that of the RDFFR_PP.
* Move some precondition code up inside optimizePTestInstr, so that it
  covers the new code paths for RDFFR which return earlier.
  * Only consider RDFFR, PTEST in same basic block.
  * Check for other flag setting instructions between the two, abort if
    found.
  * Drop an old TODO comment about removing dead PTEST instructions.

RDFFR_P to follow in later patch.

Differential Revision: https://reviews.llvm.org/D101357

3fa6510f

[clang][AVR] Redefine some types to be compatible with avr-gcc · 892c56ea
Ben Shi authored May 12, 2021
```
Reviewed By: dylanmckay

Differential Revision: https://reviews.llvm.org/D100701
```
892c56ea

[NFC] Use variable GEP index in vec_demanded_elts tests · 61630814

David Sherwood authored May 12, 2021

I've changed a test in each of these files:

  Transforms/InstCombine/vec_demanded_elts.ll
  Transforms/InstCombine/vec_demanded_elts-inseltpoison.ll

to use a variable GEP index instead of a constant value so that
we're testing the more general case.

61630814

[Passes] Reenable the relative lookup table converter pass for ELF and COFF on aarch64 · 4b98199c

Martin Storsjö authored May 12, 2021

The bug (PR50227, affecting COFF) that caused the revert in
6f5670a4 has been fixed in
382c505d now, so it should be safe
to reenable the pass for that target (and ELF).

In PR50227 it's also mentioned that the same pass seems to cause
problems on aarch64 on darwin, so leaving it disabled there for now.

4b98199c

[llvm-objdump] Exclude __mh_*_header symbols during MachO disassembly · 5a439015

Greg McGary authored May 02, 2021

`__mh_(execute|dylib|dylinker|bundle|preload|object)_header` are special symbols whose values hold the VMA of the Mach header to support introspection. They are attached to the first section in `__TEXT`, even though their addresses are outside `__TEXT`, and they do not refer to code.

It is normally harmless, but when the first section of `__TEXT` has no other symbols, `__mh_*_header` is considered by the disassembler when determing function boundaries. Since `__mh_*_header` refers to an address outside `__TEXT`, the boundary determination fails and disassembly quits.

Since `__TEXT,__text` normally has symbols, this bug is obscured. Experiments placing `__stubs` and `__stub_helper` first exposed the bug, since neither has symbols.

Differential Revision: https://reviews.llvm.org/D101786

5a439015

[AMDGPU] Improve Codegen for build_vector · 46adccc5

Julien Pagès authored May 12, 2021

Improve the code generation of build_vector.
Use the v_pack_b32_f16 instruction instead of
v_and_b32 + v_lshl_or_b32

Differential Revision: https://reviews.llvm.org/D98081

Patch by Julien Pagès!

46adccc5

[InstCombine] ~(C + X) --> ~C - X (PR50308) · 554b1bce

Roman Lebedev authored May 12, 2021

We can not rely on (C+X)-->(X+C) already happening,
because we might not have visited that `add` yet.
The added testcase would get stuck in an endless combine loop.

554b1bce

[TargetRegisterInfo] Speed up getAllocatableSet. NFCI. · a383d325

Jay Foad authored May 11, 2021

MachineRegisterInfo caches the reserved register set that is computed by
by TargetRegisterInfo::getReservedRegs, so call into MRI to get the
reserved regs to avoid recomputing them.

In particular this speeds up AMDGPU's SIFormMemoryClauses pass because
AMDGPU has a particularly complicated reserved set that is expensive to
compute.

Differential Revision: https://reviews.llvm.org/D102318

a383d325

[mlir][linalg] Remove IndexedGenericOp support from LinalgInterchangePattern... · 06bb9cf3

Tobias Gysi authored May 12, 2021

after introducing the IndexedGenericOp to GenericOp canonicalization (https://reviews.llvm.org/D101612).

Differential Revision: https://reviews.llvm.org/D102245

06bb9cf3

[AMDGPU] Remove assert · a4db7025
Piotr Sobczak authored May 12, 2021
```
Remove assert introduced in D101177, following post-commit feedback.
```
a4db7025

[x86] try harder to lower to PCMPGT instead of not-of-PCMPEQ · f58e0513

Sanjay Patel authored May 12, 2021

This is motivated by the example in https://llvm.org/PR50055 ,
but it doesn't do anything for that bug currently because we
don't actually have a zero-extended setcc there.

Proof for the generic transform (inverse of what we would
try to do in combining):
https://alive2.llvm.org/ce/z/aBL-Mg

Differential Revision: https://reviews.llvm.org/D102275

f58e0513

[x86] add test for pcmpeq with 0; NFC · 24d06fff
Sanjay Patel authored May 11, 2021

24d06fff

[clang-tidy][NFC] Simplify a lot of bugprone-sizeof-expression matchers · 4c59ab34

Nathan James authored May 12, 2021

There should be a follow up to this for changing the traversal mode, but some of the tests don't like that.

Reviewed By: steveire

Differential Revision: https://reviews.llvm.org/D101614

4c59ab34

[mlir][linalg] Remove IndexedGenericOp support from LinalgBufferize... · c6b96ae0

Tobias Gysi authored May 12, 2021

after introducing the IndexedGenericOp to GenericOp canonicalization (https://reviews.llvm.org/D101612).

Differential Revision: https://reviews.llvm.org/D102308

c6b96ae0

Revert "[scudo] Enable arm32 arch" · 7d0a81ca

David Spickett authored May 12, 2021

This reverts commit b1a77e46.

Which has a failing test on our armv7 bots:
https://lab.llvm.org/buildbot/#/builders/59/builds/1812

7d0a81ca

[clang-tidy] Enable the use of IgnoreArray flag in pro-type-member-init rule · 16332508

Hana Joo authored May 12, 2021

The `IgnoreArray` flag was not used before while running the rule. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=47288 | b/47288 ]]

Reviewed By: njames93

Differential Revision: https://reviews.llvm.org/D101239

16332508

[mlir][linalg] Remove IndexedGenericOp support from LinalgToStandard... · 0fb364a9

Tobias Gysi authored May 12, 2021

after introducing the IndexedGenericOp to GenericOp canonicalization (https://reviews.llvm.org/D101612).

Differential Revision: https://reviews.llvm.org/D102236

0fb364a9

[libcxx] NFC. Correct wordings of _LIBCPP_ASSERT debug messages · 96100f15
Kristina Bessonova authored May 09, 2021
```
Differential Revision: https://reviews.llvm.org/D102195
```
96100f15

[X86][AVX] canonicalizeShuffleMaskWithHorizOp - improve support for 256/512-bit vectors · 72e242a2

Simon Pilgrim authored May 12, 2021

Extend the HOP(HOP(X,Y),HOP(Z,W)) and SHUFFLE(HOP(X,Y),HOP(Z,W)) folds to handle repeating 256/512-bit vector cases.

This allows us to drop the UNPACK(HOP(),HOP()) custom fold in combineTargetShuffle.

This required isRepeatedTargetShuffleMask to be tweaked to support target shuffle masks taking more than 2 inputs.

72e242a2

[llvm-readelf] Unhide short options to match the command guide · 81900dc4

gbreynoo authored May 12, 2021

The readelf command guide shows the short options used as aliases but
these are not found in the help text unless --show-hidden is used, other
tools show aliases with --help. This change fixes the help output to be
consistent with the command guide.

Differential Revision: https://reviews.llvm.org/D102173

81900dc4

[llvm-symbolizer] Place Mach-O options into the Mach-O option group. · 725bc3eb

gbreynoo authored May 12, 2021

In the help output of other tools and in the symbolizer command guide,
Mach-O specific options are in their own section. This change fixes the
symbolizer help output to be consistent.

Differential Revision: https://reviews.llvm.org/D102178

725bc3eb

[LoopVectorize] Fix scalarisation crash in widenPHIInstruction for scalable vectors · b7a11274

David Sherwood authored Apr 21, 2021

In InnerLoopVectorizer::widenPHIInstruction there are cases where we have
to scalarise a pointer induction variable after vectorisation. For scalable
vectors we already deal with the case where the pointer induction variable
is uniform, but we currently crash if not uniform. For fixed width vectors
we calculate every lane of the scalarised pointer induction variable for a
given VF, however this cannot work for scalable vectors. In this case I
have added support for caching the whole vector value for each unrolled
part so that we can always extract an arbitrary element. Additionally, we
still continue to cache the known minimum number of lanes too in order
to improve code quality by avoiding an extractelement operation.

I have adapted an existing test `pointer_iv_mixed` from the file:

Transforms/LoopVectorize/consecutive-ptr-uniforms.ll

and added it here for scalable vectors instead:

Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

Differential Revision: https://reviews.llvm.org/D101294

b7a11274

[AArch64][SVE] Improve sve.convert.to.svbool lowering · 6e6f9a63

Peter Waller authored Apr 29, 2021

The sve.convert.to.svbool lowering has the effect of widening a logical
<M x i1> vector representing lanes into a physical <16 x i1> vector
representing bits in a predicate register.

In general, if converting to svbool, the contents of lanes in the
physical register might not be known. For sve.convert.to.svbool the new
lanes are specified to be zeroed, requiring 'and' instructions to mask
off the new lanes. For lanes coming from a ptrue or a comparison,
however, they are known to be zero.

CodeGen Before:
  ptrue p0.s, vl16
  ptrue p1.s
  ptrue p2.b
  and   p0.b, p2/z, p0.b, p1.b
  ret

After:
  ptrue	p0.s, vl16
  ret

Differential Revision: https://reviews.llvm.org/D101544

6e6f9a63

[Process/elf-core] Read PID from FreeBSD prpsinfo · 71e66da0

Michał Górny authored May 05, 2021

Add a function to read NT_PRPSINFO note from FreeBSD core dumps.  This
is necessary to get the process ID (NT_PRSTATUS has only thread ID).
Move the lp64 check from NT_PRSTATUS parsing to the parseFreeBSDNotes()
to avoid repeating it.

Differential Revision: https://reviews.llvm.org/D101893

71e66da0

[lldb] [Process/elf-core] Fix reading FPRs from FreeBSD/i386 cores · b6c0edb9

Michał Górny authored Apr 22, 2021

The FreeBSD coredumps from i386 systems contain only FSAVE-style
NT_FPREGSET. Since we do not really support reading that kind of data
anymore, just use NT_X86_XSTATE to get FXSAVE-style data when available.

Differential Revision: https://reviews.llvm.org/D101086

b6c0edb9

Reapply "[DebugInfo] Fix updateDbgUsersToReg to support DBG_VALUE_LIST" · fdb055f4

Stephen Tozer authored May 10, 2021

Previous crashes caused by this patch were the result of machine
subregisters being incorrectly handled in updateDbgUsersToReg; this has
been fixed by using RegUnits to determine overlapping registers, instead
of using the register values directly.

Differential Revision: https://reviews.llvm.org/D101523

This reverts commit 7ca26c5f.

fdb055f4

Remove Windows editline from LLDB · 5af3a664

Neal (nealsid) authored May 12, 2021

I don't mean to undo others' work but it looks like the hand-rolled EditLine for LLDB on Windows isn't used. It'd be easier to make changes to bring the other platforms' Editline wrapper up to date (e.g. simplifying char vs wchar_t) without modifying/testing this one too.

Reviewed By: amccarth

Differential Revision: https://reviews.llvm.org/D102208

5af3a664

[AMDGPU] Skip invariant loads when avoiding WAR conflicts · 68137ef5

Piotr Sobczak authored May 12, 2021

No need to handle invariant loads when avoiding WAR conflicts, as
there cannot be a vector store to the same memory location.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D101177

68137ef5

Revert "[PowerPC] [Clang] Enable float128 feature on VSX targets" · cbd93cee

Qiu Chaofan authored May 12, 2021

This commit brought build break in some f128 related tests. But that's
not the root cause. There exists some differences between Clang and
GCC's definition for 128-bit float types on PPC, so macros/functions in
glibc may not work with clang -mfloat128 well. We need to handle this
carefully and reland it.

cbd93cee