Commits · d89d3dfae17d7795dc1ef013db66272020de1959 · Lorenzo Albano / LLVM bpEVL

Jul 13, 2021

[SelectionDAG] Check use before combining into USUBSAT · 954a15d6
Qiu Chaofan authored Jul 13, 2021
```
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D105789
```
954a15d6

[AMDGPU] Make some VOP1 instructions rematerializable · d46d534d

Stanislav Mekhanoshin authored Jul 09, 2021

This is a pilot change to verify the logic. The rest will be
done in a same way, at least the rest of VOP1.

Differential Revision: https://reviews.llvm.org/D105742

d46d534d

[PowerPC] Fix typo in vector shuffle combining · 6fd9c190

Qiu Chaofan authored Jul 13, 2021

a22ecb45 fixed a crash on big endian subtargets. This commit fixes a typo
in that commit which may cause miscompile.

6fd9c190

[SimplifyCFG] Fix SimplifyBranchOnICmpChain to be undef/poison safe. · e338d08a

hyeongyu kim authored Jul 13, 2021

This patch fixes the problem of SimplifyBranchOnICmpChain that occurs
when extra values are Undef or poison.

Suppose the %mode is 51 and the %Cond is poison, and let's look at the
case below.
```
	%A = icmp ne i32 %mode, 0
	%B = icmp ne i32 %mode, 51
	%C = select i1 %A, i1 %B, i1 false
	%D = select i1 %C, i1 %Cond, i1 false
	br i1 %D, label %T, label %F
=>
	br i1 %Cond, label %switch.early.test, label %F
switch.early.test:
	switch i32 %mode, label %T [
		i32 51, label %F
		i32 0, label %F
	]
```
incorrectness: https://alive2.llvm.org/ce/z/BWScX

Code before transformation will not raise UB because %C and %D is false,
and it will not use %Cond. But after transformation, %Cond is being used
immediately, and it will raise UB.

This problem can be solved by adding freeze instruction.

correctness: https://alive2.llvm.org/ce/z/x9x4oY

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104569

e338d08a

[ARM] Introduce MVEEXT ISel lowering · ca781510

David Green authored Jul 13, 2021

Similar to D91921 (and D104515) this introduces two MVESEXT and MVEZEXT
nodes that larger-than-legal sext and zext are lowered to. These either
get optimized away or end up becoming a series of stack loads/store, in
order to perform the extending whilst keeping the order of the lanes
correct. They are generated from v8i16->v8i32, v16i8->v16i16 and
v16i8->v16i32 extends, potentially with a intermediate extend for the
larger v16i8->v16i32 extend. A number of combines have been added for
obvious cases that come up in tests, notably MVEEXT of shuffles. More
may be needed in the future, but this seems to cover most of the cases
that come up in the tests.

Differential Revision: https://reviews.llvm.org/D105090

ca781510

[NFC] Edit the comment in M68kInstrInfo::ExpandMOVSZX_RM · d7d9c577
hyeongyu kim authored Jul 13, 2021

d7d9c577
Revert "[X86][SSE] X86ISD::FSETCC nodes (cmpss/cmpsd) return a 0/-1 allbits signbits result" · 606551ee
Vitaly Buka authored Jul 12, 2021
```
Fails here https://lab.llvm.org/buildbot/#/builders/37/builds/5267

This reverts commit e4aa6ad1.
```
606551ee

[GlobalISel] Handle more types in narrowScalar for eq/ne G_ICMP · 47d0780f

Jessica Paquette authored Jun 29, 2021

Generalize the existing eq/ne case using `extractParts`. The original code only
handled narrowings for types of width 2n->n. This generalization allows for any
type that can be broken down by `extractParts`.

General overview is:

- Loop over each narrow-sized part and do exactly what the 2-register case did.
- Loop over the leftover-sized parts and do the same thing
- Widen the leftover-sized XOR results to the desired narrow size
- OR that all together and then do the comparison against 0 (just like the old
  code)

This shows up a lot when building clang for AArch64 using GlobalISel, so it's
worth fixing. For the sake of simplicity, this doesn't handle the non-eq/ne
case yet.

Also remove the code in this case that notifies the observer; we're just going
to delete MI anyway so talking to the observer shouldn't be necessary.

Differential Revision: https://reviews.llvm.org/D105161

47d0780f

[OpaquePtr][ISel] Use ArgListEntry::IndirectType more · 7987c462
Arthur Eubanks authored Jul 12, 2021

7987c462
[OpaquePointers][ThreadSanitizer] Cleanup calls to PointerType::getElementType() · 0e6424ac
Arthur Eubanks authored Jul 09, 2021
```
Reviewed By: #opaque-pointers, dblaikie

Differential Revision: https://reviews.llvm.org/D105710
```
0e6424ac

DebugInfo: Use debug_rnglists.dwo for ranges in debug_info.dwo when parsing DWARFv5 · ea91749f

David Blaikie authored Jul 12, 2021

This call would incorrectly overwrite (with the .debug_rnglists.dwo from
the executable, if there was one) the rnglists section instead of the
correct value (from the .debug_rnglists.dwo in the .dwo file) that's
applied in DWARFUnit::tryExtractDIEsIfNeeded

ea91749f

[AArch64] Dump a little more info about unimplemented reg-to-reg copies. NFC · 6611fbc6
Jon Roelofs authored Jul 12, 2021

6611fbc6

[AArch64] Optimize overflow checks for [s|u]mul.with.overflow.i32. · 6c04b7dd

Eli Friedman authored Jul 12, 2021

Saves one instruction for signed, uses a cheaper instruction for
unsigned.

Differential Revision: https://reviews.llvm.org/D105770

6c04b7dd

Jul 12, 2021

[SelectionDAG][RISCV] Support @llvm.vscale.i64() on 32-bit targets. · ec1cdee6
Eli Friedman authored Jul 12, 2021
```
Not really useful on its own, but D105673 depends on it.

Differential Revision: https://reviews.llvm.org/D105840
```
ec1cdee6

[PowerPC] Fix the splat immediate in PPCMIPeephole depending on if we have an... · 35909ff6

Amy Kwan authored Jul 12, 2021

[PowerPC] Fix the splat immediate in PPCMIPeephole depending on if we have an Altivec and VSX splat instruction.

An assertion of the following can occur because Altivec and VSX splats use a different operand number for the immediate:
```
int64_t llvm::MachineOperand::getImm() const: Assertion `isImm() && "Wrong MachineOperand accessor"' failed.
```
This patch updates PPCMIPeephole.cpp assign the correct splat immediate.

Differential Revision: https://reviews.llvm.org/D105790

35909ff6

[Attributes] Determine attribute properties from TableGen data · 7ed3e878

Nikita Popov authored Jul 11, 2021

Continuing from D105763, this allows placing certain properties
about attributes in the TableGen definition. In particular, we
store whether an attribute applies to fn/param/ret (or a combination
thereof). This information is used by the Verifier, as well as the
ForceFunctionAttrs pass. I also plan to use this in LLParser,
which also duplicates info on which attributes are valid where.

This keeps metadata about attributes in one place, and makes it
more likely that it stays in sync, rather than in various
functions spread across the codebase.

Differential Revision: https://reviews.llvm.org/D105780

7ed3e878

[WebAssembly] fix typo in range check for Asm locals · 1689d14e
Wouter van Oortmerssen authored Jul 12, 2021

1689d14e
[Attributes] Remove duplicate attribute in typeIncompatible() (NFC) · 59bb7226
Nikita Popov authored Jul 12, 2021
```
InAlloca was listed twice, once as a normal attribute, once as a
type attribute.
```
59bb7226

[Attributes] Replace doesAttrKindHaveArgument() (NFC) · 6ac32872

Nikita Popov authored Jul 12, 2021

This is now the same as isIntAttrKind(), so use that instead, as
it does not require manual maintenance. The naming is also more
accurate in that both int and type attributes have an argument,
but this method was only targeting int attributes.

I initially wanted to tighten the AttrBuilder assertion, but we
have some in-tree uses that would violate it.

6ac32872

[CostModel][X86] Adjust fptosi/fptoui SSE/AVX legalized costs based on llvm-mca reports. · ae0d73ac

Simon Pilgrim authored Jul 12, 2021

Update (mainly) vXf32/vXf64 -> vXi8/vXi16 fptosi/fptoui costs based on the worst case costs from the script in D103695.

Move to using legalized types wherever possible, which allows us to prune the cost tables.

ae0d73ac

[ARC] Add disassembly for the conditioned move immediate instruction · 6b3eba7c

Thomas Johnson authored Jul 08, 2021

This change is a step towards implementing codegen for __builtin_clz().
Full support for CLZ with a regression test will follow shortly.
Differential Revision: https://reviews.llvm.org/D105560

6b3eba7c

[Attributes] Simplify attribute sorting (NFCI) · 363e12ae

Nikita Popov authored Jul 12, 2021

It's not necessary to explicitly sort by enum/int/type attribute,
as the attribute kinds are already sorted this way. We can directly
sort by kind.

363e12ae

[Attributes] Assert correct attribute constructor is used (NFCI) · 3fb0621f

Nikita Popov authored Jul 12, 2021

Assert that enum/int/type attributes go through the constructor
they are supposed to use.

To make sure this can't happen via invalid bitcode, explicitly
verify that the attribute kind if correct there.

3fb0621f

[Attributes] Make type attribute handling more generic (NFCI) · 5d1464cb

Nikita Popov authored Jul 10, 2021

Followup to D105658 to make AttrBuilder automatically work with
new type attributes. TableGen is tweaked to emit First/LastTypeAttr
markers, based on which we can handle type attributes
programmatically.

Differential Revision: https://reviews.llvm.org/D105763

5d1464cb

[PowerPC] Custom Lowering BUILD_VECTOR for v2i64 for P7 as well · 2377eca9

Jinsong Ji authored Jul 12, 2021

The lowering for v2i64 is now guarded with hasDirectMove,
however, the current lowering can handle the pattern correctly,
only lowering it when there is efficient patterns and corresponding
instructions.

The original guard was added in D21135, and was for Legal action.
The code has evloved now, this guard is not necessary anymore.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D105596

2377eca9

[WebAssembly] Custom combines for f32x4.demote_zero_f64x2 · cbabfc63

Thomas Lively authored Jul 12, 2021

Replace the clang builtin function and LLVM intrinsic for
f32x4.demote_zero_f64x2 with combines from normal SDNodes. Also add missing
combines for i32x4.trunc_sat_zero_f64x2_{s,u}, which share the same pattern.

Differential Revision: https://reviews.llvm.org/D105755

cbabfc63

[X86] Teach X86FloatingPoint's handleCall to only erase the FP stack if there... · d5c97f4b

Craig Topper authored Jul 12, 2021

[X86] Teach X86FloatingPoint's handleCall to only erase the FP stack if there is a regmask operand that clobbers the FP stack.

There are some calls to functions like `__alloca` that are missing
a regmask operand. Lack of a regmask operand means that all
registers that aren't mentioned by def operands are preserved.
__alloca only updates EAX and ESP and has def operands for
them so this is ok. Because there is no regmask the register
allocator won't spill the FP registers across the call. Assuming
we want to keep the FP stack untoched across these calls, we
need to handle this is in the FP stackifier.

We might want to add a proper regmask operand to the code that
creates these calls to indicate all registers are preserved, but we'd
still need this change to the FP stackifier to know to preserve the
FP stack for such a regmask.

The test is kind of long, but bugpoint wasn't able to reduce it
any further.

Fixes PR50782

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D105762

d5c97f4b

[AIX] Emit version string in .file directive · 28fb69e0

Jinsong Ji authored Jul 12, 2021

AIX .file directive support including compiler version string.
https://www.ibm.com/docs/en/aix/7.2?topic=ops-file-pseudo-op

This patch adds the support so that it will be easier to identify build
compiler in objects.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D105743

28fb69e0

[PowerPC] Implement trap and conversion builtins for XL compatibility · ef49d925

Albion Fung authored Jul 11, 2021

This patch implements trap and FP to and from double conversions. The builtins
generate code that mirror what is generated from the XL compiler. Intrinsics
are named conventionally with builtin_ppc, but are aliased to provide the same
builtin names as the XL compiler.

Differential Revision: https://reviews.llvm.org/D103668

ef49d925

[SelectionDAG] Simplify PromoteIntRes_INSERT_SUBVECTOR to only handle result · 112c0903

Bradley Smith authored Jul 08, 2021

Let other parts of legalization handle the rest of the node, this allows
re-use of existing optimizations elsewhere.

Differential Revision: https://reviews.llvm.org/D105624

112c0903

Remove unused parameter from parseMSInlineAsm. · e49985bb

Simon Tatham authored Jul 12, 2021

No implementation uses the `LocCookie` parameter at all. Errors are
reported from inside that function by `llvm::SourceMgr`, and the
instance of that at the clang call site arranges to pass the error
messages back to a `ClangAsmParserCallback`, which is where the clang
SourceLocation for the error is computed.

(This is part of a patch series working towards the ability to make
SourceLocation into a 64-bit type to handle larger translation units.
But this particular change seems beneficial in its own right.)

Reviewed By: miyuki

Differential Revision: https://reviews.llvm.org/D105490

e49985bb

[AArch64] Silence unused variable warning. NFC. · 0da3573a

Benjamin Kramer authored Jul 12, 2021

AArch64ISelLowering.cpp:15167:8: warning: unused variable 'OpCode' [-Wunused-variable]
  auto OpCode = N->getOpcode();
       ^

0da3573a

[AArch64] Add target features for Armv9-A Scalable Matrix Extension (SME) · 9e426751

Cullen Rhodes authored Jul 12, 2021

First patch in a series adding MC layer support for the Arm Scalable
Matrix Extension.

This patch adds the following features:

    sme, sme-i64, sme-f64

The sme-i64 and sme-f64 flags are for the optional I16I64 and F64F64
features.

If a target supports I16I64 then the following instructions are
implemented:

  * 64-bit integer ADDHA and ADDVA variants (D105570).
  * SMOPA, SMOPS, SUMOPA, SUMOPS, UMOPA, UMOPS, USMOPA, and USMOPS
    instructions that accumulate 16-bit integer outer products into 64-bit
    integer tiles.

If a target supports F64F64 then the FMOPA and FMOPS instructions that
accumulate double-precision floating-point outer products into
double-precision tiles are implemented.

Outer products are implemented in D105571.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D105569

9e426751

Fix warning '-Wparentheses'. NFC. · 8253fa22
Michael Liao authored Jul 12, 2021

8253fa22

[SystemZ] Bugfix for the 'N' code for inline asm operand. · 96421af5

Jonas Paulsson authored Jul 10, 2021

Don't use a local MachineOperand copy in SystemZAsmPrinter::PrintAsmOperand()
and change the register as it may break the MRI tracking of register
uses. Use an MCOperand instead.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D105757

96421af5

[InstCombine] reduce signbit test of logic ops to cmp with zero · a488c787

Sanjay Patel authored Jul 12, 2021

This is the pattern from the description of:
https://llvm.org/PR50816

There might be a way to generalize this to a smaller or more
generic pattern, but I have not found it yet.

https://alive2.llvm.org/ce/z/ShzJoF

define i1 @src(i8 %x) {
  %add = add i8 %x, -1
  %xor = xor i8 %x, -1
  %and = and i8 %add, %xor
  %r = icmp slt i8 %and, 0
  ret i1 %r
}

define i1 @tgt(i8 %x) {
  %r = icmp eq i8 %x, 0
  ret i1 %r
}

a488c787

[CostModel][X86] Adjust truncate SSE/AVX legalized costs based on llvm-mca reports. · 96b4117d

Simon Pilgrim authored Jul 12, 2021

Update truncation costs based on the worst case costs from the script in D103695.

Move to using legalized types wherever possible, which allows us to prune the cost tables.

96b4117d

[AArch64] Set the latency of Cortex-A55 stores to 1 · f73334c4

David Green authored Jul 12, 2021

This sets the latency of stores to 1 in the Cortex-A55 scheduling model,
to better match the values given in the software optimization guide.

The latency of a store in normal llvm scheduling does not appear to have
a lot of uses. If the store has no outputs then the latency is somewhat
meaningless (and pre/post increment update operands use the WriteAdr
write for those operands instead). The one place it does alter things is
the latency between a store and the end of the scheduling region, which
can in turn have an effect on the critical path length. As a result a
latency of 1 is more correct and offers ever-so-slightly better
scheduling of instructions near the end of the block.

They are marked as RetireOOO to keep the llvm-mca from introducing
stalls where non would exist.

Differential Revision: https://reviews.llvm.org/D105541

f73334c4

[RS4GC] Use one DVCache for both inlineGetBaseAndOffset() and insertParsePoints() · 88024a72

Yevgeny Rouban authored Jul 12, 2021

This new test demonstrates a case where a base ptr is generated
twice for the same value: the first one is generated while
the gc.get.pointer.base() is inlined, the second is generated
for the statepoint. This happens because the methods
inlineGetBaseAndOffset() and insertParsePoints() do not share
their defining value cache used by the findBasePointer() method.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D103240

88024a72

[llvm][sve] Lowering for VLS truncating stores · c305557a

David Truby authored Jul 12, 2021

This adds custom lowering for truncating stores when operating on
fixed length vectors in SVE. It also includes a DAG combine to
fold extends followed by truncating stores into non-truncating
stores in order to prevent this pattern appearing once truncating
stores are supported.

Currently truncating stores are not used in certain cases where
the size of the vector is larger than the target vector width.

Differential Revision: https://reviews.llvm.org/D104471

c305557a