Commits · bd0cb787d05b555791a88640dcf9800bab54d4fa · Lorenzo Albano / LLVM bpEVL

May 30, 2018

[ORC] Update JITCompileCallbackManager to support multi-threaded code. · bd0cb787

Lang Hames authored May 30, 2018

Previously JITCompileCallbackManager only supported single threaded code. This
patch embeds a VSO (see include/llvm/ExecutionEngine/Orc/Core.h) in the callback
manager. The VSO ensures that the compile callback is only executed once and that
the resulting address cached for use by subsequent re-entries.

llvm-svn: 333490

bd0cb787

[RISCV] Support resolving fixup_riscv_call and add to MCFixupKindInfo table · c3d0e892

Shiva Chen authored May 30, 2018

Resolving fixup_riscv_call by assembler when the linker relaxation diabled
and the function and callsite within the same compile unit.

And also adding static_assert after Infos array declaration
to avoid missing any new fixup in MCFixupKindInfo in the future.

Differential Revision: https://reviews.llvm.org/D47126

llvm-svn: 333487

c3d0e892

[VPlan] Replace LLVM_ATTRIBUTE_USED with ifndef NDEBUG · b94b21d4

Diego Caballero authored May 29, 2018

Minor replacement. LLVM_ATTRIBUTE_USED was introduced to silence
a warning but using #ifndef NDEBUG makes more sense in this case.

Reviewers: dblaikie, fhahn, hsaito

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D47498

llvm-svn: 333476

b94b21d4

[X86] Remove some of the extractelts from the new MOVSS+FMA patterns. · 5989db0f

Craig Topper authored May 29, 2018

We only need the extractelt that corresponds to the register we're trying to insert back into. We can't guarantee the others haven't been optimized out depending on how those operands were produced.

So instead just look for an FR32/FR64 input and emit a COPY_TO_REGCLASS to VR128 in the output pattern. This matches what we do for ADD/SUB/MUL/DIV.

llvm-svn: 333473

5989db0f

May 29, 2018

[X86] Use VR128X instead of VR128 in EVEX instruction patterns. · dbd371e9
Craig Topper authored May 29, 2018
```
llvm-svn: 333464
```
dbd371e9

[X86] Rename the operands in the recently introduced MOVSS+FMA patterns so... · aba57bfe

Craig Topper authored May 29, 2018

[X86] Rename the operands in the recently introduced MOVSS+FMA patterns so that the operand names in the output pattern are always in 1, 2, 3 order since those are the operand names in the instruction.

The order should be controlled in the input pattern.

llvm-svn: 333463

aba57bfe

Fix build error introduced in rL333459 · f4f37509
Sam Clegg authored May 29, 2018
```
The DEBUG macro was renamed LLVM_DEBUG.

llvm-svn: 333462
```
f4f37509

[LoopInstSimplify] Re-implement the core logic of loop-instsimplify to · 4cbcbb07

Chandler Carruth authored May 29, 2018

be both simpler and substantially more efficient.

Rather than use a hand-rolled iteration technique that isn't quite the
same as RPO, use the pre-built RPO loop body traversal utility.

Once visiting the loop body in RPO, we can assert that we visit defs
before uses reliably. When this is the case, the only need to iterate is
when simplifying a def that is used by a PHI node along a back-edge.
With this patch, the first pass over the loop body is just a complete
simplification of every instruction across the loop body. When we
encounter a use of a simplified instruction that stems from a PHI node
in the loop body that has already been visited (due to some cyclic CFG,
potentially the loop itself, or a nested loop, or unstructured control
flow), we recall that specific PHI node for the second iteration.
Nothing else needs to be preserved from iteration to iteration.

On the second and later iterations, only instructions known to have
simplified inputs are considered, each time starting from a set of PHIs
that had simplified inputs along the backedges.

Dead instructions are collected along the way, but deleted in a batch at
the end of each iteration making the iterations themselves substantially
simpler. This uses a new batch API for recursively deleting dead
instructions.

This alsa changes the routine to visit subloops. Because simplification
is fundamentally transitive, we may need to visit the entire loop body,
including subloops, to handle knock-on simplification.

I've added a basic test file that helps demonstrate that all of these
changes work. It includes both straight-forward loops with
simplifications as well as interesting PHI-structures, CFG-structures,
and a nested loop case.

Differential Revision: https://reviews.llvm.org/D47407

llvm-svn: 333461

4cbcbb07

[X86] Fix a potential crash that occur after r333419. · 5439b3d1

Craig Topper authored May 29, 2018

The code could issue a truncate from a small type to larger type. We need to extend in that case instead.

llvm-svn: 333460

5439b3d1

[WebAssembly] Add more error checking to object file parsing · b7c62394

Sam Clegg authored May 29, 2018

This should address some of the assert failures the fuzzer has been
finding such as:
  https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6719

Differential Revision: https://reviews.llvm.org/D47086

llvm-svn: 333459

b7c62394

AMDGPU: Fix broken check lines · 4b3829d8
Matt Arsenault authored May 29, 2018
```
llvm-svn: 333458
```
4b3829d8
AMDGPU: Fix typo in option description · 2e4d338d
Matt Arsenault authored May 29, 2018
```
llvm-svn: 333457
```
2e4d338d

AMDGPU: Round up kernel argument allocation size · 1ea0402e

Matt Arsenault authored May 29, 2018

AFAIK the driver's allocation will actually have to round this
up anyway. It is useful to track the rounded up size, so that
the end of the kernel segment is known to be dereferencable so
a wider s_load_dword can be used for a short argument at the end
of the segment.

llvm-svn: 333456

1ea0402e

[RISCV] Add peepholes for Global Address lowering patterns · 97684419

Sameer AbuAsal authored May 29, 2018

Summary:
  Base and offset are always separated when a GlobalAddress node is lowered
  (rL332641) as an optimization to reduce instruction count. However, this
  optimization is not profitable if the Global Address ends up being used in only
  instruction.

  This patch adds peephole optimizations that merge an offset of
  an address calculation into the LUI %%hi and ADD %lo of the lowering sequence.

  The peephole handles three patterns:

 1) ADDI (ADDI (LUI %hi(global)) %lo(global)), offset
     --->
      ADDI (LUI %hi(global + offset)) %lo(global + offset).

   This generates:
   lui a0, hi (global + offset)
   add a0, a0, lo (global + offset)

   Instead of

   lui a0, hi (global)
   addi a0, hi (global)
   addi a0, offset

   This pattern is for cases when the offset is small enough to fit in the
   immediate filed of ADDI (less than 12 bits).

 2) ADD ((ADDI (LUI %hi(global)) %lo(global)), (LUI hi_offset))
     --->
      offset = hi_offset << 12
      ADDI (LUI %hi(global + offset)) %lo(global + offset)

   Which generates the ASM:

   lui  a0, hi(global + offset)
   addi a0, lo(global + offset)

   Instead of:

   lui  a0, hi(global)
   addi a0, lo(global)
   lui a1, (offset)
   add a0, a0, a1

   This pattern is for cases when the offset doesn't fit in an immediate field
   of ADDI but the lower 12 bits are all zeros.

 3) ADD ((ADDI (LUI %hi(global)) %lo(global)), (ADDI lo_offset, (LUI hi_offset)))
     --->
        offset = global + offhi20<<12 + offlo12
        ADDI (LUI %hi(global + offset)) %lo(global + offset)

   Which generates the ASM:

   lui  a1, %hi(global + offset)
   addi a1, %lo(global + offset)

   Instead of:

   lui  a0, hi(global)
   addi a0, lo(global)
   lui a1, (offhi20)
   addi a1, (offlo12)
   add a0, a0, a1

   This pattern is for cases when the offset doesn't fit in an immediate field
   of ADDI and both the lower 1 bits and high 20 bits are non zero.

    Reviewers: asb

    Reviewed By: asb

    Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, apazos,
  niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang

llvm-svn: 333455

97684419

[BasicAA] Teach the analysis about atomic memcpy · 3a6c50f4

Daniel Neilson authored May 29, 2018

Summary:
A simple change to derive mod/ref info from the atomic memcpy
intrinsic in the same way as from the regular memcpy intrinsic.

llvm-svn: 333454

3a6c50f4

Update CodeView register names in a test that was missed in r333421. · 99feb567
Douglas Yung authored May 29, 2018
```
llvm-svn: 333453
```
99feb567
AMDGPU: Always set COMPUTE_PGM_RSRC2.ENABLE_TRAP_HANDLER to zero for AMDHSA as · 2ca6b1f2
Konstantin Zhuravlyov authored May 29, 2018
```
it is set by CP

Differential Revision: https://reviews.llvm.org/D47392

llvm-svn: 333451
```
2ca6b1f2
[TableGen] Use explicit constructor for InstMemo · 33b6f9ac
Florian Hahn authored May 29, 2018
```
This should fix a few buildbot failures with old
GCC versions.

llvm-svn: 333448
```
33b6f9ac

[ARM] Enable SETCCCARRY lowering for Thumb1. · 63fead0f

Eli Friedman authored May 29, 2018

We've had Thumb1 support for ARMISD::SUBE for a while now, so this just
works.  Reduces codesize a bit for 64-bit integer comparisons.

Differential Revision: https://reviews.llvm.org/D47387

llvm-svn: 333445

63fead0f

IRBuilder: Add overload for intrinsics without args · 64c6ab44
Matt Arsenault authored May 29, 2018
```
llvm-svn: 333443
```
64c6ab44
AMDGPU: Pass function directly instead of MachineFunction · ceafc55e
Matt Arsenault authored May 29, 2018
```
These functions just query the underlying IR function,
so pass it directly.

llvm-svn: 333442
```
ceafc55e
AMDGPU: Add nuw to add off of kernarg ptr · 2fb9ccf7
Matt Arsenault authored May 29, 2018
```
llvm-svn: 333441
```
2fb9ccf7

DAG: Remove redundant version of getRegisterTypeForCallingConv · ab2b79cb

Matt Arsenault authored May 29, 2018

There seems to be no real reason to have these separate copies.
The existing implementations just copy each other for x86.
For Mips there is a subtle difference, which is just a bug
since it changes based on the context where which one was called.
Dropping this version, all tests pass. If I try to merge them
to match the removed version, a test fails.

llvm-svn: 333440

ab2b79cb

AMDGPU: Split R600 MCInst lowering into its own class · 57b9342c

Tom Stellard authored May 29, 2018

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D47307

llvm-svn: 333439

57b9342c

[TableGen] Fix leaking of PhysRegInputs. · 7d3f9a88

Florian Hahn authored May 29, 2018

Instead of dynamically allocating the vector for PhysRegs, we can
allocate it on the stack and move it into InstructionMemo.

Reviewers: mcrosier, craig.topper, RKSimon, dsanders

Reviewed By: dsanders

Differential Revision: https://reviews.llvm.org/D47461

llvm-svn: 333438

7d3f9a88

TableGen: add some more helpful error messages · e7ae0f48

Nicolai Haehnle authored May 29, 2018

Summary: Change-Id: I6f3dacf675a4126134577616e259696bebdade3a

Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D47429

Change-Id: I614de12a4c154c6d53c090f2f3e53ad2d09942c5
llvm-svn: 333436

e7ae0f48

[TableGen] Fix leaking synthesized registers. · 6c21b3b5

Florian Hahn authored May 29, 2018

By keeping track of unique_ptrs to the synthesized definitions in
CodeGenRegBank we avoid leaking them.

Reviewers: dsanders, kparzysz, stoklund

Reviewed By: dsanders

Differential Revision: https://reviews.llvm.org/D47462

llvm-svn: 333434

6c21b3b5

[StrictFP] Make getStrictFPOpcodeAction(...) more accessible · b1bb60ae

Cameron McInally authored May 29, 2018

NFCI. This function will be reused in upcoming patches.

Differential Revision: https://reviews.llvm.org/D47380

llvm-svn: 333433

b1bb60ae

[X86][SSE] Regenerate sdiv combine tests · db9dbac5
Simon Pilgrim authored May 29, 2018
```
llvm-svn: 333431
```
db9dbac5
[X86][AVX] Regenerate vzeroall/vzeroupper cleanup tests · 77149a80
Simon Pilgrim authored May 29, 2018
```
llvm-svn: 333430
```
77149a80

[AArch64] Fix PR32384: bump up the number of stores per memset and memcpy · f8425340

Evandro Menezes authored May 29, 2018

As suggested in https://bugs.llvm.org/show_bug.cgi?id=32384#c1, this change
makes the inlining of `memset()` and `memcpy()` more aggressive when
compiling for speed.  The tuning remains the same when optimizing for size.

Patch by: Sebastian Pop <s.pop@samsung.com>
          Evandro Menezes <e.menezes@samsung.com>

Differential revision: https://reviews.llvm.org/D45098

llvm-svn: 333429

f8425340

[mips] Process numeric register name in the .set assignment directive · 69301c9e

Simon Atanasyan authored May 29, 2018

Now LLVM assembler cannot process the following code and generates an
error. GNU tools support .set assignment directive with numeric register
name.

```
.set r4, 4

test.s:1:11: error: invalid token in expression
  .set r4, $4
           ^
```

This patch teach assembler to handle such directives correctly.
Unfortunately a numeric register name cannot be represented as an
expression. That's why we have to maintain a separate `StringMap`
in the `MipsAsmParser` to keep mapping between aliases names and
register numbers.

Differential revision: https://reviews.llvm.org/D47464

llvm-svn: 333428

69301c9e

Revert "[AArch64] added FP16 vcvth intrinsic support" · d5a9e7bb
Amara Emerson authored May 29, 2018
```
This reverts commit r333410 due to bot failures.

llvm-svn: 333427
```
d5a9e7bb

[llvm-readobj] Support GNU_PROPERTY_X86_FEATURE_1_AND notes in .note.gnu.property · 65724254

Alexander Ivchenko authored May 29, 2018

This patch allows parsing GNU_PROPERTY_X86_FEATURE_1_AND
notes in .note.gnu.property sections. These notes
indicate that the object file is built to support Intel CET.

patch by mike.dvoretsky

Differential Revision: https://reviews.llvm.org/D47473

llvm-svn: 333424

65724254

[AArch64][SVE] Asm: Support for predicated LSL/LSR (vectors) · 8704b03c

Sander de Smalen authored May 29, 2018

Reviewers: rengolin, huntergr, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47365

llvm-svn: 333422

8704b03c

[CodeView] Add prefix to CodeView registers. · 43dce3ed

Jonas Devlieghere authored May 29, 2018

Adds CVReg to CodeView register names to prevent a duplicate symbol with
CR3 defined in termios.h, as suggested by Zachary on the mailing list.

http://lists.llvm.org/pipermail/llvm-dev/2018-May/123372.html

Differential revision: https://reviews.llvm.org/D47478

rdar://39863705

llvm-svn: 333421

43dce3ed

[X86] Scalar mask and scalar move optimizations · 96062eaa

Alexander Ivchenko authored May 29, 2018

1. Introduction of mask scalar TableGen patterns.
2. Introduction of new scalar move TableGen patterns
   and refactoring of existing ones.
3. Folding of pattern created by introducing scalar
   masking in Clang header files.

Patch by tkrupa

Differential Revision: https://reviews.llvm.org/D47012

llvm-svn: 333419

96062eaa

StackColoring: better handling of statically unreachable code · 48bf43df

Than McIntosh authored May 29, 2018

Summary:
Avoid assert/crash during liveness calculation in situations
where the incoming machine function has statically unreachable BBs.
Second attempt at submitting; this version of the change includes
a revised testcase.

Fixes PR37130.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47372

llvm-svn: 333416

48bf43df

[PowerPC] Fix the incorrect iterator inside peephole · 716103f1

Lei Huang authored May 29, 2018

Instruction selection can insert nodes into the underlying list after the root
node so iterating will thereby miss it. We should NOT assume that, the root node
is the last element in the DAG nodelist.

Patch by: steven.zhang (Qing Shan Zhang)

Differential Revision: https://reviews.llvm.org/D47437

llvm-svn: 333415

716103f1

[AArch64][SVE] Asm: Support for AND, ORR, EOR and BIC instructions. · 26b9b2a8

Sander de Smalen authored May 29, 2018

This patch addresses the following variants:
  - bitmask immediate,         e.g. 'and z0.d, z0.d, #0x6'.
  - unpredicated data vectors, e.g. 'and z0.d, z1.d, z2.d'.
  - predicated data vectors,   e.g. 'and z0.d, p0/m, z0.d, z1.d'.

And also several aliases, such as: 
  - ORN, alias of ORR.
  - EON, alias of EOR.
  - BIC, alias of AND (immediate variant)
  - MOV, alias of ORR (if unpredicated and source register operands are the same)

Reviewers: rengolin, huntergr, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47363

llvm-svn: 333414

26b9b2a8