- Jan 18, 2020
-
-
Matt Arsenault authored
Currently there are 4 different mechanisms for controlling denormal flushing behavior, and about as many equivalent frontend controls. - AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features - NVPTX uses the nvptx-f32ftz attribute - ARM directly uses the denormal-fp-math attribute - Other targets indirectly use denormal-fp-math in one DAGCombine - cl-denorms-are-zero has a corresponding denorms-are-zero attribute AMDGPU wants a distinct control for f32 flushing from f16/f64, and as far as I can tell the same is true for NVPTX (based on the attribute name). Work on consolidating these into the denormal-fp-math attribute, and a new type specific denormal-fp-math-f32 variant. Only ARM seems to support the two different flush modes, so this is overkill for the other use cases. Ideally we would error on the unsupported positive-zero mode on other targets from somewhere. Move the logic for selecting the flush mode into the compiler driver, instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32 are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as a user flag. -cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and -fno-cuda-flush-denormals-to-zero will be mapped to -fp-denormal-math-f32=ieee or preserve-sign rather than the old attributes. Stop emitting the denorms-are-zero attribute for the OpenCL flag. It has no in-tree users. The meaning would also be target dependent, such as the AMDGPU choice to treat this as only meaning allow flushing of f32 and not f16 or f64. The naming is also potentially confusing, since DAZ in other contexts refers to instructions implicitly treating input denormals as zero, not necessarily flushing output denormals to zero. This also does not attempt to change the behavior for the current attribute. The LangRef now states that the default is ieee behavior, but this is inaccurate for the current implementation. The clang handling is slightly hacky to avoid touching the existing denormal-fp-math uses. Fixing this will be left for a future patch. AMDGPU is still using the subtarget feature to control the denormal mode, but the new attribute are now emitted. A future change will switch this and remove the subtarget features.
-
Matt Arsenault authored
The existing test is overly reliant on -mattr=-flat-for-global, and some missing optimizations to re-use.
-
Matt Arsenault authored
-
Reid Kleckner authored
Avoids 637 extra FoldingSet.h and Allocator.h includes. FoldingSet.h needs Allocator.h, which is relatively expensive.
-
Siva Chandra Reddy authored
Header files included wrongly using <...> are now included using the internal path names as the new unittest framework allows us to do so. Reviewers: phosek, abrachet Differential Revision: https://reviews.llvm.org/D72743
-
Nico Weber authored
It's been an empty target since r360498 and friends (`git log --grep='Move InstPrinter files to MCTargetDesc.' llvm/lib/Target`), but due to hwo the way these targets are structured it was silently an empty target without anyone noticing. No behavior change.
-
Richard Smith authored
A TemplateIdAnnotation represents only a template-id, not a nested-name-specifier plus a template-id. Don't make a redundant copy of the CXXScopeSpec and store it on the template-id annotation. This slightly improves error recovery by more properly handling the case where we would form an invalid CXXScopeSpec while parsing a typename specifier, instead of accidentally putting the token stream into a broken "annot_template_id with a scope specifier, but with no preceding annot_cxxscope token" state.
-
LLVM GN Syncbot authored
-
Nico Weber authored
-
Evgenii Stepanov authored
Summary: Detect a run of memory tagging instructions for adjacent stack frame slots, and replace them with a shorter instruction sequence * replace STG + STG with ST2G * replace STGloop + STGloop with STGloop This code needs to run when stack slot offsets are already known, but before FrameIndex operands in STG instructions are eliminated; that's the reason for the new hook in PrologueEpilogue. This change modifies STGloop and STZGloop pseudos to take the size as an immediate integer operand, and adds _untied variants of those pseudos that are allowed to take the base address as a FI operand. This is needed to simplify recognizing an STGloop instruction as operating on a stack slot post-regalloc. This improves memtag code size by ~0.25%, and it looks like an additional ~0.1% is possible by rearranging the stack frame such that consecutive STG instructions reference adjacent slots (patch pending). Reviewers: pcc, ostannard Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70286
-
Alina Sbirlea authored
Static method MemoryDependenceResults::getLoadLoadClobberFullWidthSize does not have or use any info specific to MemoryDependenceResults. Move it to its only user: VNCoercion.
-
James Nagurne authored
Runtimes variables in a multi-target environment are defined like: RUNTIMES_target_VARIABLE_NAME RUNTIMES_target+multi_VARIABLE_NAME In my case, I have a downstream runtimes cache that does the following: set(RUNTIMES_${target}+except_LIBCXXABI_ENABLE_EXCEPTIONS ON CACHE BOOL "") set(RUNTIMES_${target}_LIBCXX_ENABLE_EXCEPTIONS OFF CACHE BOOL "") I found that I was always getting the 'target' variable value (OFF) in my 'target+except' build, which was unexpected. This behavior was caused by the loop in llvm/runtimes/CMakeLists.txt that runs through all variable names, adding '-DVARIABLE_NAME=' options to the subsequent external project's cmake command. The issue is that the loop does a single pass, such that if the 'target' value appears in the cache after the 'target+except' value, the 'target' value will take precedence. I suggest in my change here that the more specific 'target+except' value should take precedence always, without relying on CMake cache ordering. Differential Revision: https://reviews.llvm.org/D71570 Patch By: JamesNagurne
-
Peter Collingbourne authored
Differential Revision: https://reviews.llvm.org/D72896
-
Petr Hosek authored
This is an alternative to the continous mode that was implemented in D68351. This mode relies on padding and the ability to mmap a file over the existing mapping which is generally only available on POSIX systems and isn't suitable for other platforms. This change instead introduces the ability to relocate counters at runtime using a level of indirection. On every counter access, we add a bias to the counter address. This bias is stored in a symbol that's provided by the profile runtime and is initially set to zero, meaning no relocation. The runtime can mmap the profile into memory at abitrary location, and set bias to the offset between the original and the new counter location, at which point every subsequent counter access will be to the new location, which allows updating profile directly akin to the continous mode. The advantage of this implementation is that doesn't require any special OS support. The disadvantage is the extra overhead due to additional instructions required for each counter access (overhead both in terms of binary size and performance) plus duplication of counters (i.e. one copy in the binary itself and another copy that's mmapped). Differential Revision: https://reviews.llvm.org/D69740
-
- Jan 17, 2020
-
-
Sergej Jaskiewicz authored
Summary: Depends on D72847 Reviewers: vvereschaka, aorlov, andreil99 Reviewed By: vvereschaka Subscribers: mgorny, kristof.beyls, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D72850
-
Sergej Jaskiewicz authored
Summary: This patch adds a new target info object called LinuxRemoteTI. Unlike LinuxLocalTI, which asks the host system about various things like available locales, distribution name etc. which don't make sense if we're testing on a remote board, LinuxRemoteTI uses SSHExecutor to get information from the target system. Reviewers: jroelofs, ldionne, bcraig, EricWF, danalbert, mclow.lists Reviewed By: jroelofs Subscribers: christof, dexonsmith, libcxx-commits Tags: #libc Differential Revision: https://reviews.llvm.org/D72847
-
Jonas Devlieghere authored
-
Nicolas Vasilache authored
Summary: This is a simple extension to allow vectorization to work not only on GenericLinalgOp but more generally across named ops too. For now, this still only vectorizes matmul-like ops but is a step towards more generic vectorization of Linalg ops. Reviewers: ftynse Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72942
-
Eric Fiselier authored
Splits copy constructor up inlining short initialization, outlining long initialization into __init_long() which is the externally instantiated slow path initialization. Subsequently changing the copy ctor to be inlined (not externally instantiated) provides significant speed ups for short string initialization. Generated code given: void StringCopyCtor(void* mem, const std::string& s) { std::string*p = new(mem) std::string{s}; } asm: cmp byte ptr [rsi + 23], 0 js .LBB0_2 mov rax, qword ptr [rsi + 16] mov qword ptr [rdi + 16], rax movups xmm0, xmmword ptr [rsi] movups xmmword ptr [rdi], xmm0 ret .LBB0_2: jmp std::basic_string::__init_long # TAILCALL Benchmark: BM_StringCopy_Empty 5.19ns ± 6% 1.50ns ± 8% -71.02% (p=0.000 n=10+10) BM_StringCopy_Small 5.14ns ± 8% 1.53ns ± 7% -70.17% (p=0.000 n=10+10) BM_StringCopy_Large 18.9ns ± 0% 19.3ns ± 0% +1.92% (p=0.000 n=10+10) BM_StringCopy_Huge 309ns ± 1% 316ns ± 5% ~ (p=0.633 n=8+10) Patch from Martijn Vels (mvels@google.com) Reviewed as D72160.
-
Peter Collingbourne authored
As of D70146 lld GCs comdats as a group and no longer considers notes in comdats to be GC roots, so we need to move the note to a comdat with a GC root section (.init_array) in order to prevent lld from discarding the note. Differential Revision: https://reviews.llvm.org/D72936
-
Sanjay Patel authored
-
Sanjay Patel authored
-
aartbik authored
Summary: First step towards the consolidation of a lot of vector related utilities that are now all over the place (or even duplicated). Reviewers: nicolasvasilache, andydavis1 Reviewed By: nicolasvasilache, andydavis1 Subscribers: merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72955
-
Adrian Prantl authored
-
Ian Levesque authored
Extend -fxray-instrumentation-bundle to split function-entry and function-exit into two separate options, so that it is possible to instrument only function entry or only function exit. For use cases that only care about one or the other this will save significant overhead and code size. Differential Revision: https://reviews.llvm.org/D72890
-
Ian Levesque authored
XRay allows tuning by minimum function size, but also always instruments functions with loops in them. If the minimum function size is set to a large value the loop instrumention ends up causing most functions to be instrumented anyway. This adds a new flag, -fxray-ignore-loops, to disable the loop detection logic. Differential Revision: https://reviews.llvm.org/D72873
-
Ian Levesque authored
XRay allows tuning by minimum function size, but also always instruments functions with loops in them. If the minimum function size is set to a large value the loop instrumention ends up causing most functions to be instrumented anyway. This adds a new flag, xray-ignore-loops, to disable the loop detection logic. Differential Revision: https://reviews.llvm.org/D72659
-
Eric Astor authored
Summary: As discussed on the mailing list, I plan to introduce an ml-compatible MASM assembler as part of providing more of the Windows build tools. This will be similar to llvm-mc, but with different command-line parameters. This placeholder is purely a stripped-down version of llvm-mc; we'll eventually add support for the Microsoft-style command-line flags, and back it with a MASM parser. Reviewers: rnk, thakis Reviewed By: thakis Subscribers: merge_guards_bot, mgorny, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72679
-
Vedant Kumar authored
Specify -isysroot and any necessary -arch flags in the `mig` invocation when CMAKE_OSX_ARCHITECTURES is set (needed for the bridgeOS build).
-
Fangrui Song authored
This essentially reverts b841e119. Such code construct can be used in the following way: // glibc/stdlib/exit.c // clang -fuse-ld=lld => succeeded // clang -fuse-ld=lld -fpie -pie => relocation R_PLT_PC cannot refer to absolute symbol __attribute__((weak, visibility("hidden"))) extern void __call_tls_dtors(); void __run_exit_handlers() { if (__call_tls_dtors) __call_tls_dtors(); } Since we allow R_PLT_PC in -no-pie mode, it makes sense to allow it in -pie mode as well. Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D72943
-
Adrian Prantl authored
[this re-applies c0176916 with the correct commit message and phabricator link] This addresses point 1 of PR44213. https://bugs.llvm.org/show_bug.cgi?id=44213 The DW_AT_LLVM_sysroot attribute is used for Clang module debug info, to allow LLDB to import a Clang module from source. Currently it is part of each DW_TAG_module, however, it is the same for all modules in a compile unit. It is more efficient and less ambiguous to store it once in the DW_TAG_compile_unit. This should have no effect on DWARF consumers other than LLDB. Differential Revision: https://reviews.llvm.org/D71732
-
Adrian Prantl authored
This reverts commit 12e47947. I accidentally landed this patch with the wrong commit message ...
-
Adrian Prantl authored
This reverts commit c0176916.
-
Adrian Prantl authored
-
Frank Laub authored
Summary: This op is the counterpart to LLVM's atomicrmw instruction. Note that volatile and syncscope attributes are not yet supported. This will be useful for upcoming parallel versions of `affine.for` and generally for reduction-like semantics. Differential Revision: https://reviews.llvm.org/D72741
-
Eric Schweitz authored
[Flang][mlir] add a band-aid to support the creation of mutually recursive types when lowering to LLVM IR Summary: This is a temporary implementation to support Flang. The LLVM-IR parser will need to be extended in some way to support recursive types. The exact approach here is still a work-in-progress. Unfortunately, this won't pass roundtrip testing yet. Adding a comment to the test file as a reminder. Differential Revision: https://reviews.llvm.org/D72542
-
Marco Vanotti authored
Summary: This commit modifies the way `ExecuteCommand` works in fuchsia by adding special logic to handle `/dev/null`. The FuzzerCommand interface does not have a way to "discard" the output, so other parts of the code just set the output file to `getDevNull()`. The problem is that fuchsia does not have a named file that is equivalent to `/dev/null`, so opening that file just fails. This commit detects whether the specified output file is `getDevNull`, and if that's the case, it will not copy the file descriptor for stdout in the spawned process. NOTE that modifying `FuzzerCommand` to add a "discardOutput" function involves a significant refactor of all the other platforms, as they all rely on the `toString()` method of `FuzzerCommand`. This allows libfuzzer in fuchsia to run with `fork=1`, as the merge process (`FuzzerMerge.cpp`) invoked `ExecuteCommand` with `/dev/null` as the output. Reviewers: aarongreen, phosek Reviewed By: aarongreen Subscribers: #sanitizers, llvm-commits Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D72894
-
Eli Friedman authored
This reverts commit 5df53a22. Caused test failures.
-
Lei Zhang authored
Again for pleasing GCC 5.
-
Christopher Tetreault authored
Summary: * Pass the Scalability test to VectorType::get in order to be able to deserialize bitcode that contains scalable vector operations Change-Id: I37fe5b1c0c237a9153130deefdc1a6d595c7f12e Reviewers: efriedma, pcc, sdesmalen, apazos, huihuiz, chrisj Reviewed By: sdesmalen Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72792
-