Commits · 09266e4af04ec2dc3a3afc19a3f9d5658d482a44 · Lorenzo Albano / LLVM bpEVL

Nov 13, 2020

[ObjC][ARC] Clear the lists of basic blocks and instructions before · 09266e4a
Akira Hatanaka authored Nov 12, 2020
```
continuing the loop

This fixes a bug introduced in c6f1713c.
```
09266e4a
[ORC] Remove designated initializer. · bdf26d8d
Lang Hames authored Nov 13, 2020

bdf26d8d

[ORC] Break up OrcJIT library, add Orc-RPC based remote TargetProcessControl · 1d0676b5

Lang Hames authored Nov 11, 2020

implementation.

This patch aims to improve support for out-of-process JITing using OrcV2. It
introduces two new class templates, OrcRPCTargetProcessControlBase and
OrcRPCTPCServer, which together implement the TargetProcessControl API by
forwarding operations to an execution process via an Orc-RPC Endpoint. These
utilities are used to implement out-of-process JITing from llvm-jitlink to
a new llvm-jitlink-executor tool.

This patch also breaks the OrcJIT library into three parts:
  -- OrcTargetProcess: Contains code needed by the JIT execution process.
  -- OrcShared: Contains code needed by the JIT execution and compiler
     processes
  -- OrcJIT: Everything else.

This break-up allows JIT executor processes to link against OrcTargetProcess
and OrcShared only, without having to link in all of OrcJIT. Clients executing
JIT'd code in-process should start linking against OrcTargetProcess as well as
OrcJIT.

In the near future these changes will enable:
  -- Removal of the OrcRemoteTargetClient/OrcRemoteTargetServer class templates
     which provided similar functionality in OrcV1.
  -- Restoration of Chapter 5 of the Building-A-JIT tutorial series, which will
     serve as a simple usage example for these APIs.
  -- Implementation of lazy, cross-target compilation in lli's -jit-kind=orc-lazy
     mode.

1d0676b5

[AsmPrinter] fix -disable-debug-info option · 9606ef03

Jameson Nash authored Nov 12, 2020

This option was in a rather convoluted place, causing global parameters
to be set in awkward and undesirable ways to try to account for it
indirectly. Add tests for the -disable-debug-info option and ensure we
don't print unintended markers from unintended places.

Reviewed By: dstenb

Differential Revision: https://reviews.llvm.org/D91083

9606ef03

[X86] Use EVT::getIntegerVT instead of MVT::getIntegerVT where the type can be i2 or i4. · 114f0446

Craig Topper authored Nov 12, 2020

This was a mistake introduced in D91294. I'm not sure how to
exercise this with the existing code, but I hit it while trying
some follow up experiments.

114f0446

[X86] When storing v1i1/v2i1/v4i1 to memory, make sure we store zeros in the rest of the byte · a4124e45

Craig Topper authored Nov 12, 2020

We can't store garbage in the unused bits. It possible that something like zextload from i1/i2/i4 is created to read the memory. Those zextloads would be legalized assuming the extra bits are 0.

I'm not sure that the code in lowerStore is executed for the v1i1/v2i1/v4i1 case. It looks like the DAG combine in combineStore may have converted them to v8i1 first. And I think we're missing some cases to avoid going to the stack in the first place. But I don't have time to investigate those things at the moment so I wanted to focus on the correctness issue.

Should fix PR48147.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D91294

a4124e45

[IndVars] Replace checks with invariants if we cannot remove them · 77efb73c

Max Kazantsev authored Nov 13, 2020

If we cannot prove that the check is trivially true, but can prove that it either
fails on the 1st iteration or never fails, we can replace it with first iteration check.

Differential Revision: https://reviews.llvm.org/D88527
Reviewed By: skatkov

77efb73c

[InstCombine] fold sub of low-bit masked value from offset of same value · 0abde4bc

Sanjay Patel authored Nov 12, 2020

There might be some demanded/known bits way to generalize this,
but I'm not seeing it right now.

This came up as a regression when I was looking at a different
demanded bits improvement.

https://rise4fun.com/Alive/5fl

  Name: general
  Pre: ((-1 << countTrailingZeros(C1)) & C2) == 0
  %a1 = add i8 %x, C1
  %a2 = and i8 %x, C2
  %r = sub i8 %a1, %a2
  =>
  %r = and i8 %a1, ~C2

  Name: test 1
  %a1 = add i8 %x, 192
  %a2 = and i8 %x, 10
  %r = sub i8 %a1, %a2
  =>
  %r = and i8 %a1, -11

  Name: test 2
  %a1 = add i8 %x, -108
  %a2 = and i8 %x, 3
  %r = sub i8 %a1, %a2
  =>
  %r = and i8 %a1, -4

0abde4bc

[AMDGPU] Remove scratch rsrc from spill pseudos · 5ab17021
Stanislav Mekhanoshin authored Nov 09, 2020
```
Differential Revision: https://reviews.llvm.org/D91110
```
5ab17021

Nov 12, 2020

[AArch64][GlobalISel] Select CSINC and CSINV for G_SELECT with constants · d0ba6c40

Jessica Paquette authored Nov 03, 2020

Select the following:

- G_SELECT cc, 0, 1 -> CSINC zreg, zreg, cc
- G_SELECT cc 0, -1 -> CSINV zreg, zreg cc
- G_SELECT cc, 1, f -> CSINC f, zreg, inv_cc
- G_SELECT cc, -1, f -> CSINV f, zreg, inv_cc
- G_SELECT cc, t, 1 -> CSINC t, zreg, cc
- G_SELECT cc, t, -1 -> CSINC t, zreg, cc

(IR example: https://godbolt.org/z/YfPna9)

These correspond to a bunch of the AArch64csel patterns in AArch64InstrInfo.td.

Unfortunately, it doesn't seem like we can import patterns that use NZCV like
those ones do. E.g.

```
def : Pat<(AArch64csel GPR32:$tval, (i32 1), (i32 imm:$cc), NZCV),
          (CSINCWr GPR32:$tval, WZR, (i32 imm:$cc))>;
```

So we have to manually select these for now.

This replaces `selectSelectOpc` with an `emitSelect` function, which performs
these optimizations.

Differential Revision: https://reviews.llvm.org/D90701

d0ba6c40

[VE] Support vld intrinsics · 410626c9

Kazushi (Jam) Marukawa authored Nov 10, 2020

Add intrinsics for vector load instructions.  Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91332

410626c9

[AMDGPU] Enable multi-dword flat scratch load/stores · cf6565f6
Stanislav Mekhanoshin authored Nov 12, 2020
```
Differential Revision: https://reviews.llvm.org/D91384
```
cf6565f6

[AMDGPU] Fix scheduling of exp pos4 · 6881a82e

Jay Foad authored Nov 11, 2020

Also fix a similar issue in SIInsertWaitcnts, but I don't think that fix
has any effect in practice.

Differential Revision: https://reviews.llvm.org/D91290

6881a82e

[AMDGPU] Define and use names for export targets. NFC. · d7d6ac56
Jay Foad authored Nov 11, 2020
```
Differential Revision: https://reviews.llvm.org/D91289
```
d7d6ac56
[msan] Break the getShadow loop after matching an argument · 2d96859e
Jianzhou Zhao authored Nov 12, 2020
```
Reviewed-by: eugenis

Differential Revision: https://reviews.llvm.org/D91320
```
2d96859e

[BasicAA] Remove checks for GEP decomposition limit reached · c00545dc

Nikita Popov authored Nov 07, 2020

The GEP aliasing code currently checks for the GEP decomposition
limit being reached (i.e., we did not reach the "final" underlying
object). As far as I can see, these checks are not necessary. It is
perfectly fine to work with a GEP whose base can still be further
decomposed.

Looking back through the commit history, these checks were originally
introduced in 1a444489. However, I
believe that the problem this was intended to address was later
properly fixed with 1726fc69, and
the checks are no longer necessary since then (and were not the
right fix in the first place).

Differential Revision: https://reviews.llvm.org/D91010

c00545dc

[MSP430] Remove unused MVT::Glue output from MSP430ISD::SELECT_CC nodes. · 4cdf1d21

Craig Topper authored Nov 12, 2020

Follow up from a similar patch on RISCV 637f19c3

Nothing reads this Glue value that I could see. The SDNode def in
the td file does not have the SDNPOutGlue flag so I don't think
this glue would get properly propagated to MachineSDNodes if it
was used.

4cdf1d21

Reland: Introduce -dot-cfg-mssa option which creates dot-cfg style file with... · 5f672fef

Jamie Schmeiser authored Nov 12, 2020

Reland: Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source

Summary:
Expand the print-memoryssa and print<memoryssa> passes with a new hidden
option -cfg-dot-mssa that names a file. When set, a dot-cfg style file
will be generated into the named file with the memoryssa comments retained
and those blocks containing them shown in light pink. The option does
nothing in isolation.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: asbirlea (Alina Sbirlea), dblaikie (David Blaikie)

Differential Revision: https://reviews.llvm.org/D90638

5f672fef

Fix unused variable warning in release builds · 76b6cb51
Alexander Kornienko authored Nov 12, 2020

76b6cb51

[ValueTracking] Update computeKnownBitsFromShiftOperator callbacks to take... · f72d350b

Simon Pilgrim authored Nov 12, 2020

[ValueTracking] Update computeKnownBitsFromShiftOperator callbacks to take KnownBits shift amount. NFCI.

We were creating this internally, but will need to support general KnownBits amounts as part of D90479.

f72d350b

[KnownBits] Add KnownBits::makeConstant helper. NFCI. · 89967427
Simon Pilgrim authored Nov 12, 2020
```
Helper for cases where we need to create a KnownBits from a (fully known) constant value.
```
89967427

Revert "Introduce -dot-cfg-mssa option which creates dot-cfg style file with... · a20b3620

Anh Tuyen Tran authored Nov 12, 2020

Revert "Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source"

This reverts commit 45d459e7 due to
build issue in Poly.

a20b3620

[RISCV] Don't include CodeGen layer files in MC layer · 0add5f91

Craig Topper authored Nov 12, 2020

-Use MCRegister instead of Register in MC layer.
-Move some enums from RISCVInstrInfo.h to RISCVBaseInfo.h to be with other TSFlags bits.

Differential Revision: https://reviews.llvm.org/D91114

0add5f91

Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa... · 45d459e7

Jamie Schmeiser authored Nov 12, 2020

Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source

Summary:
Expand the print-memoryssa and print<memoryssa> passes with a new hidden
option -cfg-dot-mssa that names a file. When set, a dot-cfg style file
will be generated into the named file with the memoryssa comments retained
and those blocks containing them shown in light pink. The option does
nothing in isolation.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: asbirlea (Alina Sbirlea), dblaikie (David Blaikie)

Differential Revision: https://reviews.llvm.org/D90638

45d459e7

[RISCV] Add an ANDI to shift amount of FSL/FSR instructions · 9ca02d6f

Craig Topper authored Nov 12, 2020

The fshl and fshr intrinsics are defined to modulo their shift amount by the bitwidth of one of their inputs. The FSR/FSL instructions read one extra bit from the shift amount. If that bit is set the inputs are swapped. In order to preserve the semantics of the llvm intrinsics we need to make sure that the extra bit isn't set. DAG combine or instcombine may have removed any mask that was originally present.

We could be smarter here and try to use computeKnownBits to check if the bit is known zero, but wanted to start with correctness.

Differential Revision: https://reviews.llvm.org/D90905

9ca02d6f

[ValueTracking] Update computeKnownBitsFromShiftOperator callbacks to use... · 11c10654
Simon Pilgrim authored Nov 12, 2020
```
[ValueTracking] Update computeKnownBitsFromShiftOperator callbacks to use KnownBits shift handling. NFCI.
```
11c10654

Introduce -print-before-changed, making -print-changed also print before passes that modify IR · 782d6a69

Jamie Schmeiser authored Nov 12, 2020

Summary:
Add an option -print-before-changed that modifies the print-changed
behaviour so that it prints the IR before a pass that changed it in
addition to printing the IR after the pass. Note that the option
does nothing in isolation. The filtering options work as expected.
Lit tests are included.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: aeubanks (Arthur Eubanks)

Differential Revision: https://reviews.llvm.org/D88757

782d6a69

[NFC intended] Refactor SinkAndHoistLICMFlags to allow others to construct... · f79b4833

Jamie Schmeiser authored Nov 12, 2020

[NFC intended] Refactor SinkAndHoistLICMFlags to allow others to construct without exposing internals

Summary:
Refactor SinkAdHoistLICMFlags from a struct to a class with accessors and constructors to allow other
classes to construct flags with meaningful defaults while not exposing LICM internal details.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: asbirlea (Alina Sbirlea)

Differential Revision: https://reviews.llvm.org/D90482

f79b4833

[ARM] Ensure CountReg definition dominates InsertPt when creating t2DoLoopStartTP · 11dee2ea

David Green authored Nov 12, 2020

Of course there was something missing, in this case a check that the def
of the count register we are adding to a t2DoLoopStartTP would dominate
the insertion point.

In the future, when we remove some of these COPY's in between, the
t2DoLoopStartTP will always become the last instruction in the block,
preventing this from happening. In the meantime we need to check they
are created in a sensible order.

Differential Revision: https://reviews.llvm.org/D91287

11dee2ea

[LLD] Fix include following 45b8a741 · ec63dfe3
Alexandre Ganea authored Nov 12, 2020

ec63dfe3

[LLD][COFF] When using LLD-as-a-library, always prevent re-entrance on failures · 45b8a741

Alexandre Ganea authored Nov 12, 2020

This is a follow-up for D70378 (Cover usage of LLD as a library).

While debugging an intermittent failure on a bot, I recalled this scenario which
causes the issue:

1.When executing lld/test/ELF/invalid/symtab-sh-info.s L45, we reach
lld::elf::Obj-File::ObjFile() which goes straight into its base ELFFileBase(),
then ELFFileBase::init().
2.At that point fatal() is thrown in lld/ELF/InputFiles.cpp L381, leaving a
half-initialized ObjFile instance.
3.We then end up in lld::exitLld() and since we are running with LLD_IN_TEST, we
hapily restore the control flow to CrashRecoveryContext::RunSafely() then back
in lld::safeLldMain().
4.Before this patch, we called errorHandler().reset() just after, and this
attempted to reset the associated SpecificAlloc<ObjFile<ELF64LE>>. That tried
to free the half-initialized ObjFile instance, and more precisely its
ObjFile::dwarf member.

Sometimes that worked, sometimes it failed and was catched by the
CrashRecoveryContext. This scenario was the reason we called
errorHandler().reset() through a CrashRecoveryContext.

But in some rare cases, the above repro somehow corrupted the heap, creating a
stack overflow. When the CrashRecoveryContext's filter (that is,
__except (ExceptionFilter(GetExceptionInformation()))) tried to handle the
exception, it crashed again since the stack was exhausted -- and that took the
whole application down. That is the issue seen on the bot. Locally it happens
about 1 times out of 15.

Now this situation can happen anywhere in LLD. Since catching stack overflows is
not a reliable scenario ATM when using CrashRecoveryContext, we're now
preventing further re-entrance when such failures occur, by signaling
lld::SafeReturn::canRunAgain=false. When running with LLD_IN_TEST=2 (or above),
only one iteration will be executed, instead of two.

Differential Revision: https://reviews.llvm.org/D88348

45b8a741

[VE] Change the default type of v64 register class · a72d3842

Kazushi (Jam) Marukawa authored Nov 10, 2020

Change the default type of v64 register class from v512i32 to v256f64.
Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91301

a72d3842

[SVE] Deal with SVE tuple call arguments correctly when running out of registers · 3225fcf1

David Sherwood authored Oct 22, 2020

When passing SVE types as arguments to function calls we can run
out of hardware SVE registers. This is normally fine, since we
switch to an indirect mode where we pass a pointer to a SVE stack
object in a GPR. However, if we switch over part-way through
processing a SVE tuple then part of it will be in registers and
the other part will be on the stack.

I've fixed this by ensuring that:

1. When we don't have enough registers to allocate the whole block
   we mark any remaining SVE registers temporarily as allocated.
2. We temporarily remove the InConsecutiveRegs flags from the last
   tuple part argument and reinvoke the autogenerated calling
   convention handler. Doing this prevents the code from entering
   an infinite recursion and, in combination with 1), ensures we
   switch over to the Indirect mode.
3. After allocating a GPR register for the pointer to the tuple we
   then deallocate any SVE registers we marked as allocated in 1).
   We also set the InConsecutiveRegs flags back how they were before.
4. I've changed the AArch64ISelLowering LowerCALL and
   LowerFormalArguments functions to detect the start of a tuple,
   which involves allocating a single stack object and doing the
   correct numbers of legal loads and stores.

Differential Revision: https://reviews.llvm.org/D90219

3225fcf1

[AArch64][GlobalISel] Optimize G_PTR_ADD with a negated offset to be a G_SUB. · ad376657
Amara Emerson authored Nov 11, 2020

ad376657

[NFC][SCEV] Generalize monotonicity check for full and limited iteration space · 2734a9eb

Max Kazantsev authored Nov 12, 2020

A piece of logic of `isLoopInvariantExitCondDuringFirstIterations` is actually
a generalized predicate monotonicity check. This patch moves it into the
corresponding method and generalizes it a bit.

Differential Revision: https://reviews.llvm.org/D90395
Reviewed By: apilipenko

2734a9eb

Revert "[Coroutine] Allocas used by StoreInst does not always escape" · 94a45a80
Xun Li authored Nov 11, 2020
```
This reverts commit 8bc7b927, which landed by accident.
```
94a45a80

[IndVars] IV user should not prevent use widening · d6dd9385

Max Kazantsev authored Nov 12, 2020

Sometimes the an instruction we are trying to widen is used by the IV
(which means the instruction is the IV increment). Currently this may
prevent its widening. We should ignore such user because it will be
dead once the transform is done anyways.

Differential Revision: https://reviews.llvm.org/D90920
Reviewed By: fhahn

d6dd9385

[Coroutine] Allocas used by StoreInst does not always escape · 8bc7b927

Xun Li authored Nov 11, 2020

In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped.
This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away.
These used should be handled without conservatively marking them escaped.

This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still
consider the original alloca not escaping and keep it on the stack instead of putting it on the frame.

Differential Revision: https://reviews.llvm.org/D91305

8bc7b927

[IndVars] Recognize 'sub nuw' expressed as 'add' for widening · 2e01ceaf

Max Kazantsev authored Nov 12, 2020

InstCombine canonicalizes 'sub nuw' instructions to 'add' without the
`nuw` flag. The typical case where we see it is decrementing induction
variables. For them, IndVars fails to prove that it's legal to widen them,
and inserts unprofitable `zext`'s.

This patch adds recognition of such pattern using SCEV.

Differential Revision: https://reviews.llvm.org/D89550
Reviewed By: fhahn, skatkov

2e01ceaf

[coro] Async coroutines: Allow more than 3 arguments in the dispatch function · 43133766

Arnold Schwaighofer authored Nov 09, 2020

We need to be able to call function pointers. Inline the dispatch
function.

Also inline the context projection function.

Transfer debug locations from the suspend point to the inlined functions.

Use the function argument index instead of the function argument in
coro.id.async. This solves any spurious use issues.

Coerce the arguments of the tail call function at a suspend point. The LLVM
optimizer seems to drop casts leading to a vararg intrinsic.

rdar://70097093

Differential Revision: https://reviews.llvm.org/D91098

43133766