Commits · a4451d88ee456304c26d552749aea6a7f5154bde · Lorenzo Albano / LLVM bpEVL

Jan 18, 2020

Consolidate internal denormal flushing controls · a4451d88

Matt Arsenault authored Nov 01, 2019

Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.

- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute

AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).

Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.

Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.

-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.

Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.

This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.

AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.

a4451d88

AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp · 592de000

Matt Arsenault authored Jan 17, 2020

The existing test is overly reliant on -mattr=-flat-for-global, and
some missing optimizations to re-use.

592de000

AMDGPU/GlobalISel: Select DS append/consume · ec962831
Matt Arsenault authored Jan 16, 2020

ec962831

Remove unneeded FoldingSet.h include from Attributes.h · 423e3db6

Reid Kleckner authored Jan 17, 2020

Avoids 637 extra FoldingSet.h and Allocator.h includes. FoldingSet.h
needs Allocator.h, which is relatively expensive.

423e3db6

[libc] Replace the use of gtest with a new light weight unittest framework. · c7453fad

Siva Chandra Reddy authored Jan 10, 2020

Header files included wrongly using <...> are now included using the
internal path names as the new unittest framework allows us to do so.

Reviewers: phosek, abrachet

Differential Revision: https://reviews.llvm.org/D72743

c7453fad

Remove AllTargetsAsmPrinters · 1d568bf9

Nico Weber authored Jan 17, 2020

It's been an empty target since r360498 and friends
(`git log --grep='Move InstPrinter files to MCTargetDesc.' llvm/lib/Target`),
but due to hwo the way these targets are structured it was silently
an empty target without anyone noticing.

No behavior change.

1d568bf9

Remove redundant CXXScopeSpec from TemplateIdAnnotation. · a42fd84c

Richard Smith authored Jan 17, 2020

A TemplateIdAnnotation represents only a template-id, not a
nested-name-specifier plus a template-id. Don't make a redundant copy of
the CXXScopeSpec and store it on the template-id annotation.

This slightly improves error recovery by more properly handling the case
where we would form an invalid CXXScopeSpec while parsing a typename
specifier, instead of accidentally putting the token stream into a
broken "annot_template_id with a scope specifier, but with no preceding
annot_cxxscope token" state.

a42fd84c

[gn build] Port d3db13af · 49dc3a94
LLVM GN Syncbot authored Jan 17, 2020

49dc3a94
[gn build] fix build after 22af2cbe · 6afa0e88
Nico Weber authored Jan 17, 2020

6afa0e88

Merge memtag instructions with adjacent stack slots. · d081962d

Evgenii Stepanov authored Jan 08, 2020

Summary:
Detect a run of memory tagging instructions for adjacent stack frame slots,
and replace them with a shorter instruction sequence
* replace STG + STG with ST2G
* replace STGloop + STGloop with STGloop

This code needs to run when stack slot offsets are already known, but before
FrameIndex operands in STG instructions are eliminated; that's the
reason for the new hook in PrologueEpilogue.

This change modifies STGloop and STZGloop pseudos to take the size as an
immediate integer operand, and adds _untied variants of those pseudos
that are allowed to take the base address as a FI operand. This is needed to
simplify recognizing an STGloop instruction as operating on a stack slot
post-regalloc.

This improves memtag code size by ~0.25%, and it looks like an additional ~0.1%
is possible by rearranging the stack frame such that consecutive STG
instructions reference adjacent slots (patch pending).

Reviewers: pcc, ostannard

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70286

d081962d

[MemDepAnalysis/VNCoercion] Move static method to its only use. [NFCI] · 9f6c6ee6

Alina Sbirlea authored Jan 17, 2020

Static method MemoryDependenceResults::getLoadLoadClobberFullWidthSize
does not have or use any info specific to MemoryDependenceResults.
Move it to its only user: VNCoercion.

9f6c6ee6

[CMake] Prefer multi-target variables over generic target variables in runtimes build · 128e1ebd

James Nagurne authored Jan 17, 2020

Runtimes variables in a multi-target environment are defined like:

RUNTIMES_target_VARIABLE_NAME
RUNTIMES_target+multi_VARIABLE_NAME

In my case, I have a downstream runtimes cache that does the following:

set(RUNTIMES_${target}+except_LIBCXXABI_ENABLE_EXCEPTIONS ON CACHE BOOL "")
set(RUNTIMES_${target}_LIBCXX_ENABLE_EXCEPTIONS OFF CACHE BOOL "")

I found that I was always getting the 'target' variable value (OFF) in
my 'target+except' build, which was unexpected.  This behavior was
caused by the loop in llvm/runtimes/CMakeLists.txt that runs through all
variable names, adding '-DVARIABLE_NAME=' options to the subsequent
external project's cmake command.

The issue is that the loop does a single pass, such that if the 'target'
value appears in the cache after the 'target+except' value, the 'target'
value will take precedence. I suggest in my change here that the more
specific 'target+except' value should take precedence always, without
relying on CMake cache ordering.

Differential Revision: https://reviews.llvm.org/D71570

Patch By: JamesNagurne

128e1ebd

hwasan: Remove dead code. NFCI. · 9b9c68a2
Peter Collingbourne authored Jan 16, 2020
```
Differential Revision: https://reviews.llvm.org/D72896
```
9b9c68a2

[profile] Support counter relocation at runtime · d3db13af

Petr Hosek authored Oct 04, 2019

This is an alternative to the continous mode that was implemented in
D68351. This mode relies on padding and the ability to mmap a file over
the existing mapping which is generally only available on POSIX systems
and isn't suitable for other platforms.

This change instead introduces the ability to relocate counters at
runtime using a level of indirection. On every counter access, we add a
bias to the counter address. This bias is stored in a symbol that's
provided by the profile runtime and is initially set to zero, meaning no
relocation. The runtime can mmap the profile into memory at abitrary
location, and set bias to the offset between the original and the new
counter location, at which point every subsequent counter access will be
to the new location, which allows updating profile directly akin to the
continous mode.

The advantage of this implementation is that doesn't require any special
OS support. The disadvantage is the extra overhead due to additional
instructions required for each counter access (overhead both in terms of
binary size and performance) plus duplication of counters (i.e. one copy
in the binary itself and another copy that's mmapped).

Differential Revision: https://reviews.llvm.org/D69740

d3db13af

Jan 17, 2020

[CMake] Use LinuxRemoteTI instead of LinuxLocalTI in CrossWinToARMLinux cmake cache · 383ff4ea

Sergej Jaskiewicz authored Jan 18, 2020

Summary: Depends on D72847

Reviewers: vvereschaka, aorlov, andreil99

Reviewed By: vvereschaka

Subscribers: mgorny, kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D72850

383ff4ea

[libcxx] Introduce LinuxRemoteTI for remote testing · 049c437c

Sergej Jaskiewicz authored Jan 18, 2020

Summary:
This patch adds a new target info object called LinuxRemoteTI.
Unlike LinuxLocalTI, which asks the host system about various things
like available locales, distribution name etc. which don't make sense
if we're testing on a remote board, LinuxRemoteTI uses SSHExecutor
to get information from the target system.

Reviewers: jroelofs, ldionne, bcraig, EricWF, danalbert, mclow.lists

Reviewed By: jroelofs

Subscribers: christof, dexonsmith, libcxx-commits

Tags: #libc

Differential Revision: https://reviews.llvm.org/D72847

049c437c

[lldb/Docs] Fix formatting for the variable formatting page · a93aa534
Jonas Devlieghere authored Jan 17, 2020

a93aa534

[mlir][Linalg] Extend linalg vectorization to MatmulOp · 64c4dcb5

Nicolas Vasilache authored Jan 17, 2020

Summary:
This is a simple extension to allow vectorization to work not only on GenericLinalgOp
but more generally across named ops too.
For now, this still only vectorizes matmul-like ops but is a step towards more
generic vectorization of Linalg ops.

Reviewers: ftynse

Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72942

64c4dcb5

[libc++] Optimize / partially inline basic_string copy constructor · a8a9c8e0

Eric Fiselier authored Jan 17, 2020

Splits copy constructor up inlining short initialization, outlining long
initialization into __init_long() which is the externally instantiated slow
path initialization.

Subsequently changing the copy ctor to be inlined (not externally instantiated)
provides significant speed ups for short string initialization.

Generated code given:

void StringCopyCtor(void* mem, const std::string& s) {
    std::string*p = new(mem) std::string{s};
}

asm:
        cmp     byte ptr [rsi + 23], 0
        js      .LBB0_2
        mov     rax, qword ptr [rsi + 16]
        mov     qword ptr [rdi + 16], rax
        movups  xmm0, xmmword ptr [rsi]
        movups  xmmword ptr [rdi], xmm0
        ret
.LBB0_2:
        jmp     std::basic_string::__init_long # TAILCALL

Benchmark:
BM_StringCopy_Empty                                           5.19ns ± 6%             1.50ns ± 8%  -71.02%        (p=0.000 n=10+10)
BM_StringCopy_Small                                           5.14ns ± 8%             1.53ns ± 7%  -70.17%        (p=0.000 n=10+10)
BM_StringCopy_Large                                           18.9ns ± 0%             19.3ns ± 0%   +1.92%        (p=0.000 n=10+10)
BM_StringCopy_Huge                                             309ns ± 1%              316ns ± 5%     ~            (p=0.633 n=8+10)

Patch from Martijn Vels (mvels@google.com)
Reviewed as D72160.

a8a9c8e0

hwasan: Move .note.hwasan.globals note to hwasan.module_ctor comdat. · cd40bd0a

Peter Collingbourne authored Jan 17, 2020

As of D70146 lld GCs comdats as a group and no longer considers notes in
comdats to be GC roots, so we need to move the note to a comdat with a GC root
section (.init_array) in order to prevent lld from discarding the note.

Differential Revision: https://reviews.llvm.org/D72936

cd40bd0a

[InstSimplify] add test for select of vector constants; NFC · a8b9c936
Sanjay Patel authored Jan 17, 2020

a8b9c936
[InstSimplify] add test for select of FP constants; NFC · 3ae38d95
Sanjay Patel authored Jan 17, 2020

3ae38d95

[mlir] [VectorOps] Rename Utils.h into VectorUtils.h · 0361a961

aartbik authored Jan 17, 2020

Summary:
First step towards the consolidation
of a lot of vector related utilities
that are now all over the place
(or even duplicated).

Reviewers: nicolasvasilache, andydavis1

Reviewed By: nicolasvasilache, andydavis1

Subscribers: merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72955

0361a961

Pass length of string in Go binding of CreateCompileUnit · 63c42617
Adrian Prantl authored Jan 17, 2020

63c42617

[xray] Allow instrumenting only function entry and/or only function exit · 97ba4830

Ian Levesque authored Jan 17, 2020

Extend -fxray-instrumentation-bundle to split function-entry and
function-exit into two separate options, so that it is possible to
instrument only function entry or only function exit.  For use cases
that only care about one or the other this will save significant overhead
and code size.

Differential Revision: https://reviews.llvm.org/D72890

97ba4830

[clang][xray] Add -fxray-ignore-loops option · 1d62be24

Ian Levesque authored Jan 17, 2020

XRay allows tuning by minimum function size, but also always instruments
functions with loops in them. If the minimum function size is set to a
large value the loop instrumention ends up causing most functions to be
instrumented anyway. This adds a new flag, -fxray-ignore-loops, to disable
the loop detection logic.

Differential Revision: https://reviews.llvm.org/D72873

1d62be24

[xray] Add xray-ignore-loops option · 7628e474

Ian Levesque authored Jan 17, 2020

XRay allows tuning by minimum function size, but also always instruments
functions with loops in them. If the minimum function size is set to a
large value the loop instrumention ends up causing most functions to be
instrumented anyway. This adds a new flag, xray-ignore-loops, to disable
the loop detection logic.

Differential Revision: https://reviews.llvm.org/D72659

7628e474

[ms] [llvm-ml] Add placeholder for llvm-ml, based on llvm-mc · 22af2cbe

Eric Astor authored Jan 17, 2020

Summary:
As discussed on the mailing list, I plan to introduce an ml-compatible MASM assembler as part of providing more of the Windows build tools. This will be similar to llvm-mc, but with different command-line parameters.

This placeholder is purely a stripped-down version of llvm-mc; we'll eventually add support for the Microsoft-style command-line flags, and back it with a MASM parser.

Reviewers: rnk, thakis

Reviewed By: thakis

Subscribers: merge_guards_bot, mgorny, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72679

22af2cbe

debugserver: Pass -arch flags to mig invocation as needed · 510758da

Vedant Kumar authored Jan 17, 2020

Specify -isysroot and any necessary -arch flags in the `mig` invocation
when CMAKE_OSX_ARCHITECTURES is set (needed for the bridgeOS build).

510758da

[ELF] Allow R_PLT_PC (R_PC) to a hidden undefined weak symbol · 6ab89c3c

Fangrui Song authored Jan 17, 2020

This essentially reverts b841e119.

Such code construct can be used in the following way:

  // glibc/stdlib/exit.c
  // clang -fuse-ld=lld => succeeded
  // clang -fuse-ld=lld -fpie -pie => relocation R_PLT_PC cannot refer to absolute symbol
  __attribute__((weak, visibility("hidden"))) extern void __call_tls_dtors();
  void __run_exit_handlers() {
    if (__call_tls_dtors)
        __call_tls_dtors();
  }

Since we allow R_PLT_PC in -no-pie mode, it makes sense to allow it in
-pie mode as well.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D72943

6ab89c3c

Move the sysroot attribute from DIModule to DICompileUnit · 7b30370e

Adrian Prantl authored Jan 14, 2020

[this re-applies c0176916
 with the correct commit message and phabricator link]

This addresses point 1 of PR44213.
https://bugs.llvm.org/show_bug.cgi?id=44213

The DW_AT_LLVM_sysroot attribute is used for Clang module debug info,
to allow LLDB to import a Clang module from source. Currently it is
part of each DW_TAG_module, however, it is the same for all modules in
a compile unit. It is more efficient and less ambiguous to store it
once in the DW_TAG_compile_unit.

This should have no effect on DWARF consumers other than LLDB.

Differential Revision: https://reviews.llvm.org/D71732

7b30370e

Revert "Rename DW_AT_LLVM_isysroot to DW_AT_LLVM_sysroot" · c17aee67
Adrian Prantl authored Jan 17, 2020
```
This reverts commit 12e47947.

I accidentally landed this patch with the wrong commit message ...
```
c17aee67
Revert "Attempt to fix Go syntax error" · 94dd096f
Adrian Prantl authored Jan 17, 2020
```
This reverts commit c0176916.
```
94dd096f
Attempt to fix Go syntax error · c0176916
Adrian Prantl authored Jan 17, 2020

c0176916

[MLIR] LLVM dialect: Add llvm.atomicrmw · 60a0c612

Frank Laub authored Jan 17, 2020

Summary:
This op is the counterpart to LLVM's atomicrmw instruction. Note that
volatile and syncscope attributes are not yet supported.

This will be useful for upcoming parallel versions of `affine.for` and generally
for reduction-like semantics.

Differential Revision: https://reviews.llvm.org/D72741

60a0c612

[Flang][mlir] add a band-aid to support the creation of mutually recursive... · 37e2560d

Eric Schweitz authored Jan 17, 2020

[Flang][mlir] add a band-aid to support the creation of mutually recursive types when lowering to LLVM IR

Summary:
This is a temporary implementation to support Flang.  The LLVM-IR parser
will need to be extended in some way to support recursive types.  The
exact approach here is still a work-in-progress.

Unfortunately, this won't pass roundtrip testing yet. Adding a comment
to the test file as a reminder.

Differential Revision: https://reviews.llvm.org/D72542

37e2560d

[libFuzzer] Allow discarding output in ExecuteCommand in Fuchsia. · 44aaca3d

Marco Vanotti authored Jan 16, 2020

Summary:
This commit modifies the way `ExecuteCommand` works in fuchsia by adding
special logic to handle `/dev/null`.

The FuzzerCommand interface does not have a way to "discard" the output,
so other parts of the code just set the output file to `getDevNull()`.
The problem is that fuchsia does not have a named file that is
equivalent to `/dev/null`, so opening that file just fails.

This commit detects whether the specified output file is `getDevNull`,
and if that's the case, it will not copy the file descriptor for stdout
in the spawned process.

NOTE that modifying `FuzzerCommand` to add a "discardOutput" function
involves a significant refactor of all the other platforms, as they all
rely on the `toString()` method of `FuzzerCommand`.

This allows libfuzzer in fuchsia to run with `fork=1`, as the merge
process (`FuzzerMerge.cpp`) invoked `ExecuteCommand` with `/dev/null` as the
output.

Reviewers: aarongreen, phosek

Reviewed By: aarongreen

Subscribers: #sanitizers, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D72894

44aaca3d

Revert "[SVE] Pass Scalable argument to VectorType::get in Bitcode Reader" · 447dcef7
Eli Friedman authored Jan 17, 2020
```
This reverts commit 5df53a22.

Caused test failures.
```
447dcef7
[mlir][spirv] Explicitly construct ArrayRef from static array · 927f8f40
Lei Zhang authored Jan 17, 2020
```
Again for pleasing GCC 5.
```
927f8f40

[SVE] Pass Scalable argument to VectorType::get in Bitcode Reader · 5df53a22

Christopher Tetreault authored Jan 17, 2020

Summary:
* Pass the Scalability test to VectorType::get in order to be
able to deserialize bitcode that contains scalable vector operations

Change-Id: I37fe5b1c0c237a9153130deefdc1a6d595c7f12e

Reviewers: efriedma, pcc, sdesmalen, apazos, huihuiz, chrisj

Reviewed By: sdesmalen

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72792

5df53a22