- Sep 20, 2021
-
-
Paul Robinson authored
-
Nico Weber authored
This reverts commit 6d7b3d6b. Breaks running cmake with `-DCLANG_ENABLE_STATIC_ANALYZER=OFF` without turning off CLANG_TIDY_ENABLE_STATIC_ANALYZER. See comments on https://reviews.llvm.org/D109611 for details.
-
Nico Weber authored
See discussion on https://reviews.llvm.org/D110016 for details.
-
Craig Topper authored
If either of the multiplicands is a splat, we can sink it to use vfmacc.vf or similar.
-
Craig Topper authored
This is another case of a splat being in another basic block preventing SelectionDAG from optimizing it.
-
Arthur O'Dwyer authored
- Simplify the structure of the new tests. - Test const containers as well as non-const containers, since it's easy to do so. - Remove redundant enable-iffing of helper structs' member functions. (They're not instantiated unless they're called, and who would call them?) - Fix indentation and use more consistent SFINAE method in <unordered_map>. - Add _LIBCPP_INLINE_VISIBILITY on some swap functions. Differential Revision: https://reviews.llvm.org/D109011
-
Arthur O'Dwyer authored
Now that __builtin_is_constant_evaluated() is present on all supported compilers, we can use it to skip the UB-inducing assert in cases where the computation might be happening at constexpr time. Differential Revision: https://reviews.llvm.org/D101674
-
Alex Langford authored
-
Arthur Eubanks authored
-Wl,-z,defs doesn't work with sanitizers. See https://clang.llvm.org/docs/AddressSanitizer.html Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D110086
-
Nikita Popov authored
We implement logic to convert a byte offset into a sequence of GEP indices for that offset in a number of places. This patch adds a DataLayout::getGEPIndicesForOffset() method, which implements the core logic. I've updated SROA, ConstantFolding and InstCombine to use it, and there's a few more places where it looks relevant. Differential Revision: https://reviews.llvm.org/D110043
-
MaheshRavishankar authored
For `memref.subview` operations, when there are more than one unit-dimensions, the strides need to be used to figure out which of the unit-dims are actually dropped. Differential Revision: https://reviews.llvm.org/D109418
-
Peyton, Jonathan L authored
The indirect lock table can exhibit a race condition during initializing and setting/unsetting locks. This occurs if the lock table is resized by one thread (during an omp_init_lock) and accessed (during an omp_set|unset_lock) by another thread. The test runtime/test/lock/omp_init_lock.c test exposed this issue and will fail if run enough times. This patch restructures the lock table so pointer/iterator validity is always kept. Instead of reallocating a single table to a larger size, the lock table begins preallocated to accommodate 8K locks. Each row of the table is allocated as needed with each row allowing 1K locks. If the 8K limit is reached for the initial table, then another table, capable of holding double the number of locks, is allocated and linked as the next table. The indices stored in the user's locks take this linked structure into account when finding the lock within the table. Differential Revision: https://reviews.llvm.org/D109725
-
Geoffrey Martin-Noble authored
When https://reviews.llvm.org/D109520 was landed, it reverted the addition of this switch case added in https://reviews.llvm.org/D109293. This caused `-Wswitch` failures (and presumably broke the functionality added in the latter patch).
-
Yuanfang Chen authored
While at it, add the diagnosis message "left operand of comma operator has no effect" (used by GCC) for comma operator. This also makes Clang diagnose in the constant evaluation context which aligns with GCC/MSVC behavior. (https://godbolt.org/z/7zxb8Tx96) Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D103938
-
MaheshRavishankar authored
Add an interface that allows grouping together all covolution and pooling ops within Linalg named ops. The interface currently - the indexing map used for input/image access is valid - the filter and output are accessed using projected permutations - that all loops are charecterizable as one iterating over - batch dimension, - output image dimensions, - filter convolved dimensions, - output channel dimensions, - input channel dimensions, - depth multiplier (for depthwise convolutions) Differential Revision: https://reviews.llvm.org/D109793
-
Fangrui Song authored
This reverts commit 4b80f012. debuginfo-tests has been renamed to cross-project-tests.
-
Fangrui Song authored
This partially reverts commits 1fc2a47f and 9816e726. See D109727. Replacing config.guess in favor of {gcc,clang} -dumpmachine can avoid the riscv64-{redhat,suse}-linux GCC detection. Acked-by:
Luís Marques <luismarques@lowrisc.org>
-
Arthur O'Dwyer authored
All supported compilers have supported `=delete` as an extension in C++03 mode for many years at this point. Differential Revision: https://reviews.llvm.org/D109942
-
Craig Topper authored
-
Craig Topper authored
[RISCV] Add test cases showing failure to use .vf vector operations when splat is in another basic block. NFC We should have CGP copy the splats into the same basic block as the FP operation so that SelectionDAG can fold them.
-
Vedant Kumar authored
When adding an image to a target for crashlog purposes, avoid specifying the architecture of the image. This has the effect of making SBTarget::AddModule infer the ArchSpec for the image based on the SBTarget's architecture, which LLDB puts serious effort into calculating correctly (in TargetList::CreateTargetInternal). The status quo is that LLDB randomly guesses the ArchSpec for a module if its architecture is specified, via: ``` SBTarget::AddModule -> Platform::GetAugmentedArchSpec -> Platform::IsCompatibleArchitecture -> GetSupportedArchitectureAtIndex -> {ARM,x86}GetSupportedArchitectureAtIndex ``` ... which means that the same crashlog can fail to load on an Apple Silicon Mac (due to the random guess of arm64e-apple-macosx for the module's ArchSpec not being compatible with the SBTarget's (correct) ArchSpec), while loading just fine on an Intel Mac. I'm not sure how to add a test for this (it doesn't look like there's test coverage of this path in-tree). It seems like it would be pretty complicated to regression test: the host LLDB would need to be built for arm64e, we'd need a hand-crafted arm64e iOS crashlog, and we'd need a binary with an iOS deployment target. I'm open to other / simpler options. rdar://82679400 Differential Revision: https://reviews.llvm.org/D110013
-
cchen authored
-
Mehdi Amini authored
This reverts commit 644b55d5. The added test is failing the bots.
-
Mehdi Amini authored
-
Alexander Grund authored
When building a tool in a non-standard environment (e.g. custom compiler path -> LD_LIBRARY_PATH set) then `use_default_shell_env = True` is required to run that tool in the same environment or otherwise the build will fail due to missing symbols. See https://github.com/google/jax/issues/7842 for this issue and https://github.com/tensorflow/tensorflow/pull/44549 for related fix in TF. Reviewed By: GMNGeoffrey Differential Revision: https://reviews.llvm.org/D109873
-
Amy Huang authored
See https://reviews.llvm.org/D109904
-
Fangrui Song authored
Similar to D69607 but for archive member extraction unrelated to GC. This patch adds --why-extract=. Prior art: GNU ld -M prints ``` Archive member included to satisfy reference by file (symbol) a.a(a.o) main.o (a) b.a(b.o) (b()) ``` -M is mainly for input section/symbol assignment <-> output section mapping (often huge output) and the information may appear ad-hoc. Apple ld64 ``` __Z1bv forced load of b.a(b.o) _a forced load of a.a(a.o) ``` It doesn't say the reference file. Arm's proprietary linker ``` Selecting member vsnprintf.o(c_wfu.l) to define vsnprintf. ... Loading member vsnprintf.o from c_wfu.l. definition: vsnprintf reference : _printf_a ``` --- --why-extract= gives the user the full data (which is much shorter than GNU ld -Map). It is easy to track a chain of references to one archive member with a one-liner, e.g. ``` % ld.lld main.o a_b.a b_c.a c.a -o /dev/null --why-extract=- | tee stdout reference extracted symbol main.o a_b.a(a_b.o) a a_b.a(a_b.o) b_c.a(b_c.o) b() b_c.a(b_c.o) c.a(c.o) c() % ruby -ane 'BEGIN{p={}}; p[$F[1]]=[$F[0],$F[2]] if $.>1; END{x="c.a(c.o)"; while y=p[x]; puts "#{y[0]} extracts #{x} to resolve #{y[1]}"; x=y[0] end}' stdout b_c.a(b_c.o) extracts c.a(c.o) to resolve c() a_b.a(a_b.o) extracts b_c.a(b_c.o) to resolve b() main.o extracts a_b.a(a_b.o) to resolve a ``` Archive member extraction happens before --gc-sections, so this may not be a live path under --gc-sections, but I think it is a good approximation in practice. * Specifying a file avoids output interleaving with --verbose. * Required `=` prevents accidental overwrite of an input if the user forgets `=`. (Most of compiler drivers' long options accept `=` but not ` `) Differential Revision: https://reviews.llvm.org/D109572
-
Nikita Popov authored
Some buildbots fail with: > C:\a\llvm-clang-x86_64-expensive-checks-win\llvm-project\llvm\lib\IR\Verifier.cpp(4352): error C2678: binary '==': no operator found which takes a left-hand operand of type 'const llvm::MDOperand' (or there is no acceptable conversion) Possibly the explicit MDOperand to Metadata* conversion will help?
-
Kazu Hirata authored
This patch fixes the warning InstructionTables.cpp:27:56: error: loop variable 'Resource' of type 'const std::pair<const uint64_t, ResourceUsage> &' (aka 'const pair<const unsigned long, llvm::mca::ResourceUsage> &') binds to a temporary constructed from type 'const std::pair<unsigned long, llvm::mca::ResourceUsage> &' [-Werror,-Wrange-loop-construct] Note that Resource is declared as: SmallVector<std::pair<uint64_t, ResourceUsage>, 4> Resources; without "const" for uint64_t.
-
LLVM GN Syncbot authored
-
Craig Topper authored
For strided accesses the loop vectorizer seems to prefer creating a vector induction variable with a start value of the form <i32 0, i32 1, i32 2, ...>. This value will be incremented each loop iteration by a splat constant equal to the length of the vector. Within the loop, arithmetic using splat values will be done on this vector induction variable to produce indices for a vector GEP. This pass attempts to dig through the arithmetic back to the phi to create a new scalar induction variable and a stride. We push all of the arithmetic out of the loop by folding it into the start, step, and stride values. Then we create a scalar GEP to use as the base pointer for a strided load or store using the computed stride. Loop strength reduce will run after this pass and can do some cleanups to the scalar GEP and induction variable. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107790
-
Fangrui Song authored
We have the rule to simulate (https://sourceware.org/binutils/docs/ld/Entry-Point.html), but the behavior is questionable (https://sourceware.org/pipermail/binutils/2021-September/117929.html). gold doesn't fall back to .text. The behavior is unlikely relied by projects (there is even a warning for executable links), so let's just delete this fallback path. Reviewed By: jhenderson, peter.smith Differential Revision: https://reviews.llvm.org/D110014
-
Nikita Popov authored
Verify that !noalias, !alias.scope and llvm.experimental.noalias.scope arguments have the format specified in https://llvm.org/docs/LangRef.html#noalias-and-alias-scope-metadata. I've fixed up a lot of broken metadata used by tests in advance. Especially using a scope instead of the expected scope list is a commonly made mistake. Differential Revision: https://reviews.llvm.org/D110026
-
Jonas Devlieghere authored
This moves the logic for adding symbols based on UUID, file and frame into little helper functions. This is in preparation for D110011. Differential revision: https://reviews.llvm.org/D110010
-
Jonas Devlieghere authored
-
Florian Hahn authored
Adds additional tests following comments from D109844. Also removes unusued in.ptr arguments and places in the call tests that used loads instead of a getval call.
-
Tobias Gysi authored
This revision depends on https://reviews.llvm.org/D109761 and https://reviews.llvm.org/D109766. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D109774
-
Morten Borup Petersen authored
This pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the 'before' region of the while operation, and indctuion variable incrementation + the loop body in the 'after' region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop. Any 'yield' ops in the for-loop are rewritten to additionally yield the (incremented) induction variable. This transformation is useful for passes where we want to consider structured control flow solely on the basis of a loop body and the computation of a loop condition. As an example, when doing high-level synthesis in CIRCT, the incrementation of an IV in a for-loop is "just another part" of a circuit datapath, and what we really care about is the distinction between our datapath and our control logic (the condition variable). Differential Revision: https://reviews.llvm.org/D108454
-
Tobias Gysi authored
-
Alexey Bataev authored
Reworked reordering algorithm. Originally, the compiler just tried to detect the most common order in the reordarable nodes (loads, stores, extractelements,extractvalues) and then fully rebuilding the graph in the best order. This was not effecient, since it required an extra memory and time for building/rebuilding tree, double the use of the scheduling budget, which could lead to missing vectorization due to exausted scheduling resources. Patch provide 2-way approach for graph reodering problem. At first, all reordering is done in-place, it doe not required tree deleting/rebuilding, it just rotates the scalars/orders/reuses masks in the graph node. The first step (top-to bottom) rotates the whole graph, similarly to the previous implementation. Compiler counts the number of the most used orders of the graph nodes with the same vectorization factor and then rotates the subgraph with the given vectorization factor to the most used order, if it is not empty. Then repeats the same procedure for the subgraphs with the smaller vectorization factor. We can do this because we still need to reshuffle smaller subgraph when buildiong operands for the graph nodes with lasrger vectorization factor, we can rotate just subgraph, not the whole graph. The second step (bottom-to-top) scans through the leaves and tries to detect the users of the leaves which can be reordered. If the leaves can be reorder in the best fashion, they are reordered and their user too. It allows to remove double shuffles to the same ordering of the operands in many cases and just reorder the user operations instead. Plus, it moves the final shuffles closer to the top of the graph and in many cases allows to remove extra shuffle because the same procedure is repeated again and we can again merge some reordering masks and reorder user nodes instead of the operands. Also, patch improves cost model for gathering of loads, which improves x264 benchmark in some cases. Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264, +3% for 508.namd, improves most of other benchmarks. The compile and link time are almost the same, though in some cases it should be better (we're not doing an extra instruction scheduling anymore) + we may vectorize more code for the large basic blocks again because of saving scheduling budget. Differential Revision: https://reviews.llvm.org/D105020
-