- Oct 14, 2021
-
-
Reid Kleckner authored
After the TargetRegistry.h move, nothing in Support includes headers from MC. However, files in tablegen use MC headers, so we must add an entry for them in tblgen srcs. Differential Revision: https://reviews.llvm.org/D111835
-
Collin Baker authored
When LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=on Asan-i386-calls-Dynamic-Test and Asan-i386-inline-Dynamic-Test fail to run on a x86_64 host. This is because asan's unit test lit files are configured once, rather than per target arch as with the non-unit tests. LD_LIBRARY_PATH ends up incorrect, and the tests try linking against the x86_64 runtime which fails. This changes the unit test CMake machinery to configure the default and dynamic unit tests once per target arch, similar to the other asan tests. Then the fix from https://reviews.llvm.org/D108859 is adapted to the unit test Lit files with some modifications. Fixes PR52158. Differential Revision: https://reviews.llvm.org/D111756
-
Arthur Eubanks authored
Previously without -disable-free, -clear-ast-before-backend would crash in ~ASTContext() due to various reasons. This works around that by doing a lot of the cleanup ahead of the destructor so that the destructor doesn't actually do any manual cleanup if we've already cleaned up beforehand. This actually does save a measurable amount of memory with -clear-ast-before-backend, although at an almost unnoticeable runtime cost: https://llvm-compile-time-tracker.com/compare.php?from=5d755b32f2775b9219f6d6e2feda5e1417dc993b&to=58ef1c7ad7e2ad45f9c97597905a8cf05a26258c&stat=max-rss Previously we weren't doing any cleanup with -disable-free, so I tried measuring the impact of always doing the cleanup and didn't measure anything noticeable on llvm-compile-time-tracker. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D111767
-
Rong Xu authored
We are seeing extremely long time in building AMDGPUInstPrinter.cpp when profile instrumentation is enabled: It takes more than 5 minutes (compared to ~8 seconds in non-instrument build). This caused by the huge statements in printInstruction functions. In profile instrumentation build, we need have extra control flow to differentiate each case statement. This in turn adds significant compile time in block placement and branch folding. Function printInstruction is not likely to benefit from PGO build as it's rarely executed in a typical compilation. So here I disable the profile instrumentation for this function. Differential Revision: https://reviews.llvm.org/D111682
-
Mogball authored
Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D111820
-
thomasraoux authored
Emit reduction during op vectorization instead of doing it when creating the transfer write. This allow us to not broadcast output arguments for reduction initial value. Differential Revision: https://reviews.llvm.org/D111825
-
Philip Reames authored
This shows the transform side of D109457, but also lets us try other approaches to the same problem. The common trend to all is that we need to explicit reason about UB to disallow possibility of infinite loops.
-
David Green authored
-
Roman Lebedev authored
While i've modelled most of the relevant tuples for AVX2, that only covered fully-interleaved groups. By definition, interleaving load of stride N means: load N*VF elements, and shuffle them into N VF-sized vectors, with 0'th vector containing elements `[0, VF)*stride + 0`, and 1'th vector containing elements `[0, VF)*stride + 1`. Example: https://godbolt.org/z/df561Me5E (i64 stride 4 vf 2 => cost 6) Now, not fully interleaved load, is when not all of these vectors is demanded. So at worst, we could just pretend that everything is demanded, and discard the non-demanded vectors. What this means is that the cost for not-fully-interleaved group should be not greater than the cost for the same fully-interleaved group, but perhaps somewhat less. Examples: https://godbolt.org/z/a78dK5Geq (i64 stride 4 (indices 012u) vf 2 => cost 4) https://godbolt.org/z/G91ceo8dM (i64 stride 4 (indices 01uu) vf 2 => cost 2) https://godbolt.org/z/5joYob9rx (i64 stride 4 (indices 0uuu) vf 2 => cost 1) As we have established over the course of last ~70 patches, (wow) `BaseT::getInterleavedMemoryOpCos()` is absolutely bogus, it is usually almost an order of magnitude overestimation, so i would claim that we should at least use the hardcoded costs of fully interleaved load groups. We could go further and adjust them e.g. by the number of demanded indices, but then i'm somewhat fearful of underestimating the cost. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111174
-
Philip Reames authored
-
Craig Topper authored
-
Nikita Popov authored
Rather than checking for loop nest preheaders upfront in IVUsers, move this requirement into isSafeToExpand() from SCEVExpander. Historically, LSR did not check whether SCEVs are safe to expand and fully relied on IVUsers to validate this. Later, support for non-expandable SCEVs was added via rigid formulas. Checking this in isSafeToExpand() makes it more obvious what exactly this check is guarding against, and avoids the awkward loop nest scan. This is a followup to https://reviews.llvm.org/D111493#3055286. Differential Revision: https://reviews.llvm.org/D111681
-
Aaron Ballman authored
Not all constants are emitted within the context of a function, so use the module's ASTContext instead because 1) that's the same as the current function ASTContext, and 2) the module can never be null. Fixes PR50787.
-
Raphael Isemann authored
The called destructors of the members require the includes that are only in the source file.
-
Frederic Cambus authored
Differential Revision: https://reviews.llvm.org/D111793
-
Craig Topper authored
CMOVGE reads SF and OF. CMOVNS only reads SF. This matches with other recent changes to use a single flag where possible. It also matches gcc codegen. I believe this technically changes whether the conditioanl move happens on INT_MIN, but for INT_MIN both registers are the same so it doesn't matter. Differential Revision: https://reviews.llvm.org/D111826
-
Michael Kruse authored
DragonEgg is not maintained anymore, hence there is no need for this functionality. Fixes llvm.org/PR52173
-
Michael Kruse authored
Make the changes top-level items, instead of subitems of the "Changes..." placeholder.
-
Aaron Ballman authored
It seems that Clang 11 regressed functionality that was working in Clang 10 regarding calling a few overloaded operators in an immediate context. Specifically, we were not checking for immediate invocations of array subscripting and the arrow operators, but we properly handle the other overloaded operators. This fixes the two problematic operators and adds some test coverage to show they're equivalent to calling the operator directly. This addresses PR50779.
-
Raphael Isemann authored
Platform instances are stored in a function-local static list. However, the logging code involves locking a function-local static mutex. This only works on some implementations where the Log mutex is by accident destroyed *after* the Platform list is destroyed. This fixes randomly failing tests due to `recursive_mutex lock failed: Invalid argument`. Reviewed By: kastiglione Differential Revision: https://reviews.llvm.org/D111816
-
Rob Suderman authored
Part of the arith update broke UiToFp32. Fixed the lowering and included a new test to detect a regression. Differential Revision: https://reviews.llvm.org/D111772
-
Raphael Isemann authored
-
David Tenty authored
This initial change adds the AIX configuration to run-buildbot, an AIX CMake cache file, and appropriate compiler and linker flags for testing AIX to the lit "from scratch" configuration files. Either of the 32-bit or 64-bit configurations can be built by setting `OBJECT_MODE` in the build environment (as is typical for AIX). Reviewed By: ldionne, #libc, #libc_abi Differential Revision: https://reviews.llvm.org/D111244
-
Nikita Popov authored
Currently, DecomposeGEP() bails out on the whole decomposition if it encounters a scalable GEP type anywhere. However, it is fine to still analyze other GEPs that we look through before hitting the scalable GEP. This does mean that the decomposed GEP base is no longer required to be the same as the underlying object. However, I don't believe this property is necessary for correctness anymore. This allows us to compute slightly more precise aliasing results for GEP chains containing scalable vectors, though my primary interest here is simplifying the code. Differential Revision: https://reviews.llvm.org/D110511
-
Daniel Sanders authored
It can be a bit confusing to stop with no explanation so we should indicate when further output was prevented by the cycle limit. Differential Revision: https://reviews.llvm.org/D111753
-
Kai Nacke authored
POSIX does not define the exact output from od tool. While most implementations use lower case characters in hex output, the z/OS USS implementation uses upper case characters. To avoid LIT failures, the FileCheck option to ignore the case must be used when checking hex bytes. Reviewed By: abhina.sreeskantharajan Differential Revision: https://reviews.llvm.org/D111427
-
Simon Pilgrim authored
This a NFC refactor patch to merge the AVX2 interleaved cost handling back into the getInterleavedMemoryOpCost base method - while getInterleavedMemoryOpCostAVX512 uses instruction and patterns very specific to AVX512+, much of the costs analysis for AVX2 can be reused for all SSE targets. This is the first step towards improving SSE and AVX1 costs that will reuse the relevant AVX2 costs by splitting some of the tables - for instance AVX1 has very similar costs for most vXi64/vXf64 interleave patterns and many sub-128bit vector costs are the same all the way down to SSE2 (or at least SSSE3). Differential Revision: https://reviews.llvm.org/D111822
-
Frederic Cambus authored
Differential Revision: https://reviews.llvm.org/D111786
-
Michael Kruse authored
This patch removes the broken bash scipt (polly.sh) and fixes the broken setup instructions in get_started.html. It also adds instructions for using Ninja and links to the LLVM getting started page. Reviewed By: Meinersbur, InnovativeInventor Differential Revision: https://reviews.llvm.org/D111685
-
Simon Pilgrim authored
[TTI][X86] Swap getInterleavedMemoryOpCostAVX2/getInterleavedMemoryOpCostAVX512 implementations. NFC. I have some upcoming refactoring for SSE/AVX1 interleaving cost support, and the diff is a lot nicer if the (unaltered) AVX512 implementation isn't stuck between getInterleavedMemoryOpCost and getInterleavedMemoryOpCostAVX2
-
Simon Pilgrim authored
The initial MemoryAccess *Current assignment is never used, and all other uses are initialized/used within the worklist loop (and not across multiple iterations) - so move the variable internal to the loop. Fixes scan-build unused assignment warning.
-
Yitzhak Mandelbaum authored
Adds `selectBound`, a `Stencil` combinator that allows the user to supply multiple alternative cases, discriminated by bound node IDs. Differential Revision: https://reviews.llvm.org/D111708
-
Kevin P. Neal authored
Currently the fadd optimizations in InstSimplify don't know how to do this NoSignedZeros "X + 0.0 ==> X" fold when using the constrained intrinsics. This adds the support. This review is derived from D106362 with some improvements from D107285 and is a follow-on to D111085. Differential Revision: https://reviews.llvm.org/D111450
-
Craig Topper authored
I've removed the Zbs W instructions that are not part of the frozen spec. References to B as an extension name have been removed. Tests are updated or split accordingly. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D110669
-
Vitaly Buka authored
Trace pointers accessed very rarely and don't need to be in hot data. Depends on D111613. Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D111614
-
Nikita Popov authored
This accepts an ArrayRef, there's no need to create a SmallVector.
-
Wenlei He authored
Add `-use-dwarf-correlation` switch to allow llvm-profgen to generate AutoFDO profile for binaries built with CSSPGO (pseudo-probe). Differential Revision: https://reviews.llvm.org/D111776
-
Joe Loser authored
Implement LWG3480 which enables `directory_iterator` and `recursive_directory_iterator` to be both a `borrowed_range` and a `view`. Reviewed By: ldionne, #libc Differential Revision: https://reviews.llvm.org/D111644
-
Julien Pages authored
Differential Revision: https://reviews.llvm.org/D111652
-
Gabor Marton authored
The solver's symbol simplification mechanism was not able to handle cases when a symbol is simplified to a concrete integer. This patch adds the capability. E.g., in the attached lit test case, the original symbol is `c + 1` and it has a `[0, 0]` range associated with it. Then, a new condition `c == 0` is assumed, so a new range constraint `[0, 0]` comes in for `c` and simplification kicks in. `c + 1` becomes `0 + 1`, but the associated range is `[0, 0]`, so now we are able to realize the contradiction. Differential Revision: https://reviews.llvm.org/D110913
-