Commits · 743e263e085042d9c66543f942e68d5ed608dd22 · Lorenzo Albano / LLVM bpEVL

Oct 14, 2021

[hexagon] Add system register, transfer support · 743e263e

Brian Cain authored Feb 22, 2021

This commit adds the system reg/regpair definitions and the corresponding
register transfer instructions.

743e263e

[NVPTX] Add VRFrame and VRFrameLocal to integer register classes · 51eefa81

Andrew Savonichev authored Oct 14, 2021

These registers are used as operands for instructions that expect an
integer register, so they should be added to Int32Regs or Int64Regs
register classes. Otherwise the machine verifier emits an error for
the following LIT tests when LLVM_ENABLE_MACHINE_VERIFIER=1
environment variable is set:

*** Bad machine code: Illegal physical register for instruction ***
- function:    kernel_func
- basic block: %bb.0 entry (0x55c8903d5438)
- instruction: %3:int64regs = LEA_ADDRi64 $vrframelocal, 0
- operand 1:   $vrframelocal
$vrframelocal is not a Int64Regs register.

    CodeGen/NVPTX/call-with-alloca-buffer.ll
    CodeGen/NVPTX/disable-opt.ll
    CodeGen/NVPTX/lower-alloca.ll
    CodeGen/NVPTX/lower-args.ll
    CodeGen/NVPTX/param-align.ll
    CodeGen/NVPTX/reg-types.ll
    DebugInfo/NVPTX/dbg-declare-alloca.ll
    DebugInfo/NVPTX/dbg-value-const-byref.ll

Differential Revision: https://reviews.llvm.org/D110164

51eefa81

[SystemZ] Remove some now unused ISD XXX_LOOP opcodes. · c0d88613
Jonas Paulsson authored Oct 14, 2021

c0d88613

[ARM] Simplify address calculation for NEON load/store · dc8a41de

Andrew Savonichev authored Sep 08, 2021

The patch attempts to optimize a sequence of SIMD loads from the same
base pointer:

    %0 = gep float*, float* base, i32 4
    %1 = bitcast float* %0 to <4 x float>*
    %2 = load <4 x float>, <4 x float>* %1
    ...
    %n1 = gep float*, float* base, i32 N
    %n2 = bitcast float* %n1 to <4 x float>*
    %n3 = load <4 x float>, <4 x float>* %n2

For AArch64 the compiler generates a sequence of LDR Qt, [Xn, #16].
However, 32-bit NEON VLD1/VST1 lack the [Wn, #imm] addressing mode, so
the address is computed before every ld/st instruction:

    add r2, r0, #32
    add r0, r0, #16
    vld1.32 {d18, d19}, [r2]
    vld1.32 {d22, d23}, [r0]

This can be improved by computing address for the first load, and then
using a post-indexed form of VLD1/VST1 to load the rest:

    add r0, r0, #16
    vld1.32 {d18, d19}, [r0]!
    vld1.32 {d22, d23}, [r0]

In order to do that, the patch adds more patterns to DAGCombine:

  - (load (add ptr inc1)) and (add ptr inc2) are now folded if inc1
    and inc2 are constants.

  - (or ptr inc) is now recognized as a pointer increment if ptr is
    sufficiently aligned.

In addition to that, we now search for all possible base updates and
then pick the best one.

Differential Revision: https://reviews.llvm.org/D108988

dc8a41de

[Codegen] TargetLowering::getCanonicalIndexType - early out scaled MVT::i8 indices. NFCI. · 88487662
Simon Pilgrim authored Oct 14, 2021
```
Avoids unused assignment scan-build warning.
```
88487662

[CostModel][X86] Pre-SSE41 targets can use PMADDWD for sext sub-i16 -> i32 · 77dcdc2f

Simon Pilgrim authored Oct 14, 2021

Without SSE41 sext/zext instructions the extensions will be split, meaning that the MUL->PMADDWD fold will split the sext_i32(x) into zext_i32(sext_i16(x))

77dcdc2f

[Orc] ELFNixPlatform::setupJITDylib - remove dead return. NFCI. · 16729d0f
Simon Pilgrim authored Oct 14, 2021
```
2 returns, one after the other - reported by coverity
```
16729d0f

Follow up to , correctly select LiveDebugValues implementation · e3e1da20

Jeremy Morse authored Oct 14, 2021

Some functions get opted out of instruction referencing if they're being
compiled with no optimisations, however the LiveDebugValues pass picks one
implementation and then sticks with it through the rest of compilation.
This leads to a segfault if we encounter a function that doesn't use
instr-ref (because it's optnone, for example), but we've already decided
to use InstrRefBasedLDV which expects to be passed a DomTree.

Solution: keep both implementations around in the pass, and pick whichever
one is appropriate to the current function.

e3e1da20

[SystemZ] Reapply memcmp and memcpy patches. · a33e4c8a

Jonas Paulsson authored Oct 12, 2021

This reverts 3562076d and includes some refactoring as well.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D111733

a33e4c8a

[SystemZ] Bugfix and refactorization of mem-mem operations · 00baad35

Jonas Paulsson authored Oct 09, 2021

This patch fixes the bug that consisted of treating variable / immediate
length mem operations (such as memcpy, memset, ...) differently. The variable
length case needs to have the length minus 1 passed due to the use of EXRL
target instructions. However, the DAGCombiner can convert a register length
argument into a constant one, and whenever that happened one byte too little
would end up being performed.

This is also a refactorization by reducing the number of opcodes and variants
involved. For any opcode (variable or constant length), only the length minus
one is passed on to the ISD node. The rest of the logic is now instead
handled during isel pseudo expansion.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D111729

00baad35

[SCEV][NFC] Simplify check with CI->isZero() exit condition · 6e1308bc

Max Kazantsev authored Oct 14, 2021

Replace check with
    if ((ExitIfTrue && CI->isZero()) || (!ExitIfTrue && CI->isOne()))
with equivalent and simpler version
    if (ExitIfTrue == CI->isZero())

6e1308bc

[SCEV][NFC] Reorder checks to delay call of all_of · 46a1dd47
Max Kazantsev authored Oct 14, 2021
```
Check lightweight getter condition before calling all_of.
```
46a1dd47

[RISCV] Optimize immediate materialisation with BSETI/BCLRI · 7e815261

Ben Shi authored Oct 14, 2021

Opitimize immediate materialisation in the following way if profitable:
1. Use BCLRI for upper 32 bits if the lower 32 bits are negative int32.
2. Use BSETI for upper 32 bits if the lower 32 bits are positive int32.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111508

7e815261

[AMDGPU] Fix 24-bit mul intrinsic generation for > 32-bit result. · b3c9d84e

Abinav Puthan Purayil authored Oct 10, 2021

The 24-bit mul intrinsics yields the low-order 32 bits. We should only
do the transformation if the operands are known to be not wider than 24
bits and the result is known to be not wider than 32 bits.

Differential Revision: https://reviews.llvm.org/D111523

b3c9d84e

[RISCV] Optimize immediate materialisation with SLLI.UW · 481db13f

Ben Shi authored Oct 13, 2021

Use LUI+SLLI.UW to compose the upper bits instead of LUI+SLLI.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111705

481db13f

[ORC] Use a Setup object for SimpleRemoteEPC construction. · 4fcc0ac1

Lang Hames authored Oct 13, 2021

SimpleRemoteEPC notionally allowed subclasses to override the
createMemoryManager and createMemoryAccess methods to use custom objects, but
could not actually be subclassed in practice (The construction process in
SimpleRemoteEPC::Create could not be re-used).

Instead of subclassing, this commit adds a SimpleRemoteEPC::Setup class that
can be used by clients to set up the memory manager and memory access members.
A default-constructed Setup object results in no change from previous behavior
(EPCGeneric* memory manager and memory access objects used by default).

4fcc0ac1

[InstCombine] Remove attributes after hoisting free above null check · 6404f4b5

Shoaib Meenai authored Oct 10, 2021

If the parameter had been annotated as nonnull because of the null
check, we want to remove the attribute, since it may no longer apply and
could result in miscompiles if left. Similarly, we also want to remove
undef-implying attributes, since they may not apply anymore either.

Fixes PR52110.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D111515

6404f4b5

Oct 13, 2021

[mlgo][aot] requrie the model is autogenerated for test determinism · 6c76d010

Mircea Trofin authored Oct 13, 2021

The tests that exercise the 'release' mode, where the model is AOT-ed,
check the output has certain properties, to validate that, indeed, a
different policy from the default one was exercised. For determinism, we
can't reliably check that output for an arbitrary learned policy, since
it could be that policy happens to mimic the default one in that
particular case.

This patch adds a requirement that those tests run only when the model
is autogenerated (e.g. on build bots).

Differential Revision: https://reviews.llvm.org/D111747

6c76d010

[instcombine] PRE freeze to only potentially posion/undef operand of phi · 47d10b25

Philip Reames authored Oct 13, 2021

This extends the foldOpIntoPhi code used when visiting a freeze user of a phi to allow any non-undef/poison operand as opposed to only non-undef/poison constants.  This lets us hoist a freeze in the increment of an IV into the preheader in many cases.

Differential Revision: https://reviews.llvm.org/D111744

47d10b25

[Support] [Path] Use std::replace instead of an explicit comparison loop. NFC. · 2a4b1539

Martin Storsjö authored Oct 04, 2021

After 8fc7a907, this loop does
the same as a plain `std::replace`.

Also clarify the comment about what this function does.

Differential Revision: https://reviews.llvm.org/D111730

2a4b1539

[X86][Costmodel] Fix `X86TTIImpl::getGSScalarCost()` · 18eef13d

Roman Lebedev authored Oct 13, 2021

`X86TTIImpl::getGSScalarCost()` has (at least) two issues:
* it naively computes the cost of sequence of `insertelement`/`extractelement`.
  If we are operating not on the XMM (but YMM/ZMM),
  this widely overestimates the cost of subvector insertions/extractions.
* Gather/scatter takes a vector of pointers, and scalarization results in us performing
  scalar memory operation for each of these pointers, but we never account for the cost
  of extracting these pointers out of the vector of pointers.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111222

18eef13d

[FuncSpec] Don't run the solver if there's nothing to do · 67a58fa3

Sjoerd Meijer authored Oct 12, 2021

Even if there are no interesting functions, the SCCP solver would still run
before bailing. Now bail earlier, avoid running the solver for nothing.

Differential Revision: https://reviews.llvm.org/D111645

67a58fa3

Make various assume bundle data structures use uint64_t · 3628bb74
Arthur Eubanks authored Oct 13, 2021
```
Following D110451, we need to make sure to support 64 bit values.
```
3628bb74

[AMDGPU] Remove unneeded emit literal check · b44eac1b

Joe Nash authored Oct 13, 2021

NFC. This check does not verify any functional property since size 8
was added. Remove it for simplicity.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D111737

Change-Id: Ifd7cbd324a137f939d8dc04acb8fbd54c9527a42

b44eac1b

[SystemZ/z/OS] Implement save of non-volatile registers on z/OS XPLINK · 0a950a2e

Kai Nacke authored Oct 13, 2021

This PR implements the save of the XPLINK callee-saved registers
on z/OS.

Reviewed By: uweigand, Kai

Differential Revision: https://reviews.llvm.org/D111653

0a950a2e

[instcombine] propagate single use freeze(gep inbounds X) · 24c90165

Philip Reames authored Oct 13, 2021

This is a follow on for D111675 which implements the gep case. I'd originally left it out because I was hoping to actually implement the inrange todo, but after a bit of staring at the code, decided to leave it as is since it doesn't effect this use case (i.e. instcombine requires the op to freeze to be an instruction).

Differential Revision: https://reviews.llvm.org/D111691

24c90165

[AMDGPU] Enable load clustering in the post-RA scheduler · c885857e

Jay Foad authored Oct 12, 2021

This has a couple of benefits:
1. It can sometimes fix clusters that got broken apart when the register
   allocator inserted a copy.
2. Post-RA scheduling does not have to worry about increasing register
   pressure, which in some cases gives it more freedom to reorder
   instructions.

Testing on a collection of 10,000 graphics shaders compiled for gfx1010
showed:
- The average length of each run of one or more load instructions
  increased by about 1%.
- The number of runs of two or more load instructions increased by
  about 4%.

Differential Revision: https://reviews.llvm.org/D111646

c885857e

[DebugInfo][InstrRef] Only calculate IDF for reg units · fbf269c7

Jeremy Morse authored Oct 13, 2021

In D110173 we start using the existing LLVM IDF calculator to place PHIs as
we reconstruct an SSA form of machine-code program. Sadly that's slower
than the old (but broken) way, this patch attempts to recover some of that
performance.

The key observation: every time we def a register, we also have to def it's
register units. If we def'd $rax, in the current implementation we
independently calculate PHI locations for {al, ah, ax, eax, hax, rax}, and
they will all have the same PHI positions. Instead of doing that, we can
calculate the PHI positions for {al, ah} and place PHIs for any aliasing
registers in the same positions. Any def of a super-register has to def
the unit, and vice versa, so this is sound. It cuts down the SSA placement
we need to do significantly.

This doesn't work for stack slots, or registers we only ever read, so place
PHIs normally for those. LiveDebugValues choses to ignore writes to SP at
calls, and now have to ignore writes to SP register units too.

Differential Revision: https://reviews.llvm.org/D111627

fbf269c7

[InstCombine] improve code comments; NFC · 02928fcb
Sanjay Patel authored Oct 13, 2021

02928fcb

[InstCombine] allow matching vector splat constants in foldLogOpOfMaskedICmps() · 905d1708

Sanjay Patel authored Oct 13, 2021

This is NFC-intended for scalar code. There are still unnecessary
m_ConstantInt restrictions in surrounding code, so this is not a
complete fix.

This prevents regressions seen with a planned follow-on to D111410.

905d1708

Follow up to work around an old compiler bug · e845ca2f

Jeremy Morse authored Oct 13, 2021

Old versions of gcc want template specialisations to happen within the
namespace where the template lives; this is still present in gcc 5.1, which
we officially support, so it has to be worked around.

e845ca2f

[DebugInfo][InstrRef] Use PHI placement utilities for machine locations · a3936a6c

Jeremy Morse authored Oct 13, 2021

InstrRefBasedLDV used to try and determine which values are in which
registers using a lattice approach; however this is hard to understand, and
broken in various ways. This patch replaces that approach with a standard
SSA approach using existing LLVM utilities. PHIs are placed at dominance
frontiers; value propagation then eliminates un-necessary PHIs.

This patch also adds a bunch of unit tests that should cover many of the
weirder forms of control flow.

Differential Revision: https://reviews.llvm.org/D110173

a3936a6c

[SVE][CodeGen] Add patterns for ADD/SUB + element count · 1a2e9019

Kerry McLaughlin authored Oct 13, 2021

This patch adds patterns to match the following with INC/DEC:
 - @llvm.aarch64.sve.cnt[b|h|w|d] intrinsics + ADD/SUB
 - vscale + ADD/SUB

For some implementations of SVE, INC/DEC VL is not as cheap as ADD/SUB and
so this behaviour is guarded by the "use-scalar-inc-vl" feature flag, which for SVE
is off by default. There are no known issues with SVE2, so this feature is
enabled by default when targeting SVE2.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D111441

1a2e9019

[X86][SSE] Add X86ISD::AVG to isCommutativeBinOp to support folding shuffles through the binop · fb2539b9
Simon Pilgrim authored Oct 13, 2021

fb2539b9

[WebAssembly] Make EH work with dynamic linking · 9261ee32

Heejin Ahn authored Sep 29, 2021

This makes Wasm EH work with dynamic linking. So far we were only able
to handle destructors, which do not use any tags or LSDA info.

1. This uses `TargetExternalSymbol` for `GCC_except_tableN` symbols,
   which points to the address of per-function LSDA info. It is more
   convenient to use than `MCSymbol` because it can take additional
   target flags.

2. When lowering `wasm_lsda` intrinsic, if PIC is enabled, make the
   symbol relative to `__memory_base` and generate the `add` node. If
   PIC is disabled, continue to use the absolute address.

3. Make tag symbols (`__cpp_exception` and `__c_longjmp`) undefined in
   the backend, because it is hard to make it work with dynamic
   linking's loading order. Instead, we make all tag symbols undefined
   in the LLVM backend and import it from JS.

4. Add support for undefined tags to the linker.

Companion patches:
- https://github.com/WebAssembly/binaryen/pull/4223
- https://github.com/emscripten-core/emscripten/pull/15266

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D111388

9261ee32

[JITLink][MachO][arm64] Mask high bits out of immediate for LDRLiteral19. · 447d3017

Lang Hames authored Oct 12, 2021

Negative deltas for LDRLiteral19 have their high bits set. If these bits aren't
masked out then they will overwrite other instruction bits, leading to a bogus
encoding.

This long-standing relocation bug was exposed by e50aea58, "[JITLink][ORC]
Major JITLinkMemoryManager refactor.", which caused memory layouts to be
reordered, which in turn lead to a previously unseen negative delta. (Unseen
because LDRLiteral19s were only created in JITLink passes where they always
pointed at segments that were layed-out-after in the old layout).

No testcase yet: Our existing regression test infrastructure is good at checking
that operand bits are correct, but provides no easy way to test for bad opcode
bits. I'll have a think about the right way to approach this.

https://llvm.org/PR52153

447d3017

[Support][mips] Remove unnecessary includes from Memory.inc · a5de04d2

Visa Hankala authored Oct 12, 2021

The mips-specific includes have been unnecessary ever since the
__clear_cache() builtin replaced cacheflush().

Differential Revision: https://reviews.llvm.org/D111486

a5de04d2

Fix bug introduced with (poison flags on floating point ops) · 4c5702cb

Philip Reames authored Oct 12, 2021

The newly introduced API for checking whether poison comes solely from flags which can be dropped was out of sync.  This was noticed by a reviewer post commit.

For the moment, disable the floating point flags.  In a follow up change, I plan to add support in dropPoisonGeneratingFlags, but that deserves to be a change of it's own.

4c5702cb

[RISCV] Optimize immediate materialisation with BCLRI · 787eeb85

Ben Shi authored Oct 11, 2021

Do the following optimization for immediate materialisation:

1. For values in range 0xffffffff 7fffffff ~ 0xffffffff 00000000, first
   generate the lower 32-bit with Val|0x80000000 (which is expected be an
   int32), then emit (BCLRI r, 31).

2. For values in range 0x80000000 ~ 0xffffffff, first generate the lower
   32-bit with Val&~0x80000000 (which is expected to be an int32), then
   emit (BSETI r, 31).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111532

787eeb85

[X86] Remove little support we had for MPX · c2d4fe51

Fangrui Song authored Oct 12, 2021

GCC 9.1 removed Intel MPX support. Linux kernel removed MPX in 2019.
glibc 2.35 will remove MPX.

Our support is limited: we support assembling of bndmov but not bnd.
Just remove it.

Reviewed By: pengfei, skan

Differential Revision: https://reviews.llvm.org/D111517

c2d4fe51