- Nov 30, 2021
-
-
Jay Foad authored
Update the documented GFX10 code sequence for nontemporal stores after D114351. Differential Revision: https://reviews.llvm.org/D114707
-
Ties Stuij authored
Introduce assembly support for Armv8.1-M PACBTI extension. This is an optional extension in v8.1-M. There are 10 new system registers and 5 new instructions, all predicated on the feature. The attribute for llvm-mc is called "pacbti". For armclang, an architecture extension also called "pacbti" was created. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Victor Campos - Ties Stuij Reviewed By: labrinea Differential Revision: https://reviews.llvm.org/D112420
-
Julian Gross authored
This patch introduces a new conversion to convert bufferization.clone operations into a memref.alloc and a memref.copy operation. This transformation is needed to transform all remaining clones which "survive" all previous transformations, before a given program is lowered further (to LLVM e.g.). Otherwise, these operations cannot be handled anymore and lead to compile errors. See: https://llvm.discourse.group/t/bufferization-error-related-to-memref-clone/4665 Differential Revision: https://reviews.llvm.org/D114233
-
Kadir Cetinkaya authored
Differential Revision: https://reviews.llvm.org/D114723
-
Alexander Belyaev authored
[RFC](https://llvm.discourse.group/t/rfc-dialect-for-bufferization-related-ops/4712) Differential Revision: https://reviews.llvm.org/D114698
-
https://reviews.llvm.org/D114680gysit authored
Update the shapes of the convolution / pooling tests that where detected after enabling verification during printing (https://reviews.llvm.org/D114680). Also split the emit_structured_generic.py file that previously contained all tests into multiple separate files to simplify debugging. Reviewed By: stellaraccident Differential Revision: https://reviews.llvm.org/D114731
-
Fangrui Song authored
-
Ben Shi authored
This patch fixes a crash when doing "llvm-objdump -D --mattr=+experimental-v" against an object file which happens to keep a word that can be decoded to VSETVLI & VSETIVLI with reserved vlmul[2:0]=4. All vtype values with reserved fields (vlmul[2:0]=4, vsew[2:0]=0b1xx, non-zero bits 8/9/10) are printed to raw immediate. Reviewed By: jhenderson, jrtc27, craig.topper Differential Revision: https://reviews.llvm.org/D114581
-
Markus Böck authored
Since VS 2022 17.1 MSVC predefines _MSVC_EXECUTION_CHARACTER_SET to inform the users of the execution character set defined at compile time. The value the macro expands to is a Windows Code Page Identifier which are documented here: https://docs.microsoft.com/en-us/windows/win32/intl/code-page-identifiers As clang currently only supports UTF-8 it is defined as 65001. If clang-cl were to support a different execution character set in the future we'd have to change the value. Fixes https://bugs.llvm.org/show_bug.cgi?id=52549 Differential Revision: https://reviews.llvm.org/D114576
-
wlei authored
AutoFDO performance is sensitive to profile density, i.e., the amount of samples in the profile relative to the program size, because profiles with insufficient samples could be inaccurate due to statistical noise and thus hurt AutoFDO performance. A previous investigation showed that AutoFDO performed better on MySQL with increased amount of samples. Therefore, we implement a profile-density computation feature to give hints about profile density to users and the compiler. We define the density of a profile Prof as follows: - For each function A in the profile, density(A) = total_samples(A) / sizeof(A). - density(Prof) = min(density(A)) for all functions A that are warm (defined below). A function is considered warm if its total-samples is within top N percent of the profile. For implementation, we reuse the `ProfileSummaryBuilder::getHotCountThreshold(..)` as threshold which can be set by percent(`--profile-summary-cutoff-hot`) or by value(`--profile-summary-hot-count`). We also introduce `--hot-function-density-threshold` to set hot function density threshold and will give suggestion if profile density is below it which implies we should increase samples. This also applies for CS profile with all profiles merged into base. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D113781
-
Roman Lebedev authored
We ask `TTI.getAddressComputationCost()` about the cost of computing vector address, and then multiply it by the vector width. This doesn't make any sense, it implies that we'd do a vector GEP and then scalarize the vector of pointers, but there is no such thing in the vectorized IR, we perform scalar GEP's. This is *especially* bad on X86, and was effectively prohibiting any scalarized vectorization of gathers/scatters, because `X86TTIImpl::getAddressComputationCost()` says that cost of vector address computation is `10` as compared to `1` for scalar. The computed costs are similar to the ones with D111222+D111220, but we end up without masked memory intrinsics that we'd then have to expand later on, without much luck. (D111363) Differential Revision: https://reviews.llvm.org/D111460
-
Nick Desaulniers authored
We can't use the existing pseudo ARM::tLDRLIT_ga_pcrel for loading the stack guard for PIC code that references the GOT, since arm-pseudo may expand this to the narrow tLDRpci rather than the wider t2LDRpci. Create a new pseudo, t2LDRLIT_ga_pcrel, and expand it to t2LDRpci. Fixes: https://bugs.chromium.org/p/chromium/issues/detail?id=1270361 Reviewed By: ardb Differential Revision: https://reviews.llvm.org/D114762
-
Carlos Galvez authored
The google-readability-casting check is meant to be on par with cpplint's readability/casting check, according to the documentation. However it currently does not diagnose functional casts, like: float x = 1.5F; int y = int(x); This is detected by cpplint, however, and the guidelines are clear that such a cast is only allowed when the type is a class type (constructor call): > You may use cast formats like `T(x)` only when `T` is a class type. Therefore, update the clang-tidy check to check this case. Differential Revision: https://reviews.llvm.org/D114427
-
Fangrui Song authored
-
Phoebe Wang authored
We should match GCC's behavior which allows floating-point type for -mno-x87 option on 32-bits. https://godbolt.org/z/KrbhfWc9o The previous block issues have partially been fixed by D112143. Reviewed By: asavonic, nickdesaulniers Differential Revision: https://reviews.llvm.org/D114162
-
Stella Laurenzo authored
* Classes that are still todo are marked with "# TODO: Auto-generated. Audit and fix." * Those without this note have been cross-checked with C++ sources and most have been spot checked by hovering in VsCode. Differential Revision: https://reviews.llvm.org/D114767
-
Stella Laurenzo authored
* set_symbol_name, get_symbol_name, set_visibility, get_visibility, replace_all_symbol_uses, walk_symbol_tables * In integrations I've been doing, I've been reaching for all of these to do both general IR manipulation and module merging. * I don't love the replace_all_symbol_uses underlying APIs since they necessitate SYMBOL_COUNT walks and have various sharp edges. I'm hoping that whatever emerges eventually for this can still retain this simple API as a one-shot. Differential Revision: https://reviews.llvm.org/D114687
-
Stella Laurenzo authored
There is no completely automated facility for generating stubs that are both accurate and comprehensive for native modules. After some experimentation, I found that MyPy's stubgen does the best at generating correct stubs with a few caveats that are relatively easy to fix: * Some types resolve to cross module symbols incorrectly. * staticmethod and classmethod signatures seem to always be completely generic and need to be manually provided. * It does not generate an __all__ which, from testing, causes namespace pollution to be visible to IDE code completion. As a first step, I did the following: * Ran `stubgen` for `_mlir.ir`, `_mlir.passmanager`, and `_mlirExecutionEngine`. * Manually looked for all instances where unnamed arguments were being emitted (i.e. as 'arg0', etc) and updated the C++ side to include names (and re-ran stubgen to get a good initial state). * Made/noted a few structural changes to each `pyi` file to make it minimally functional. * Added the `pyi` files to the CMake rules so they are installed and visible. To test, I added a `.env` file to the root of the project with `PYTHONPATH=...` set as per instructions. Then reload the developer window (in VsCode) and verify that completion works for various changes to test cases. There are still a number of overly generic signatures, but I want to check in this low-touch baseline before iterating on more ambiguous changes. This is already a big improvement. Differential Revision: https://reviews.llvm.org/D114679
-
Ellis Hoag authored
When creating a new DIBuilder with an existing DICompileUnit, load the DINodes from the current DICompileUnit so they don't get overwritten. This is done in the MachineOutliner pass, but it didn't change the CU so the bug never appeared. We need this if we ever want to add DINodes to the CU after it has been created, e.g., DIGlobalVariables. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114556
-
Christudasan Devadasan authored
Greedy register allocator prefers to move a constrained live range into a larger allocatable class over spilling them. This patch defines the necessary superclasses for vector registers. For subtargets that support copy between VGPRs and AGPRs, the vector register spills during regalloc now become just copies. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D109301
-
Guozhi Wei authored
[TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB Currently we create register mappings for registers used only once in current MBB. For registers with multiple uses, when all the uses are in the current MBB, we can also create mappings for them similarly according to the last use. For example %reg101 = ... = ... reg101 %reg103 = ADD %reg101, %reg102 We can create mapping between %reg101 and %reg103. Differential Revision: https://reviews.llvm.org/D113193
-
Craig Topper authored
Prevents crashes or cannot select errors. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D113822
-
Vitaly Buka authored
In multi-threaded application concurrent StackStore::Store may finish in order different from assigned Id. So we can't assume that after we switch writing the next block the previous is done. The workaround is to count exact number of uptr stored into the block, including skipped tail/head which were not able to fit entire trace. Depends on D114490. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D114493
-
Hsiangkai Wang authored
When we have out-going arguments passing through stack and we do not reserve the stack space in the prologue. Use BP to access stack objects after adjusting the stack pointer before function calls. callseq_start -> sp = sp - reserved_space // // Use FP to access fixed stack objects. // Use BP to access non-fixed stack objects. // call @foo callseq_end -> sp = sp + reserved_space Differential Revision: https://reviews.llvm.org/D114246
-
Hsiangkai Wang authored
If the number of arguments is too large to use register passing, it needs to occupy stack space to pass the arguments to the callee. There are two scenarios. One is to reserve the space in prologue and the other is to reserve the space before the function calls. When we need to reserve the stack space before function calls, the stack pointer is adjusted. Under the scenario, we should not use stack pointer to access the stack objects. It looks like, callseq_start -> sp = sp - reserved_space // // We should not use SP to access stack objects in this area. // call @foo callseq_end -> sp = sp + reserved_space Differential Revision: https://reviews.llvm.org/D114245
-
Mircea Trofin authored
Differential Revision: https://reviews.llvm.org/D114763
-
Vitaly Buka authored
Reviewed By: dvyukov, kstoimenov Differential Revision: https://reviews.llvm.org/D114464
-
Luís Ferreira authored
This reverts commit 6f99e1aa.
-
David Blaikie authored
-
Aart Bik authored
Moves sparse tensor output support forward by generalizing from injective insertions only to include reductions. This revision accepts the case with all parallel outer and all reduction inner loops, since that can be handled with an injective insertion still. Next revision will allow the inner parallel loop to move inward (but that will require "access pattern expansion" aka "workspace"). Reviewed By: bixia Differential Revision: https://reviews.llvm.org/D114399
-
Matthias Braun authored
Differential Revision: https://reviews.llvm.org/D112754
-
Matthias Braun authored
Differential Revision: https://reviews.llvm.org/D113151
-
Luís Ferreira authored
Anonymous symbols are represented by 0 in the mangled symbol. We should skip them in order to represent the demangled name correctly, otherwise demangled names like `demangle..anon` can happen. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114307
-
David Blaikie authored
Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114305
-
David Blaikie authored
This patch adds support for simple single qualified names that includes internal mangled names and normal symbol names. Differential Revision: https://reviews.llvm.org/D111415
-
Mircea Trofin authored
There are 2 eviction queries. One is made by tryAssign, when it attempts to free an interference occupying the hint of the candidate. The other is during 'regular' interference resolution, where we scan over all physical registers and try to see if we can evict live ranges in favor of the candidate. We currently use the same logic in both cases, just that the former never passes the cost to any subsequent query. Technically, the 2 decisions could be implemented with different policies. This patch splits the 2. RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html Differential Revision: https://reviews.llvm.org/D114019
-
Luís Ferreira authored
Reviewed By: teemperor, JDevlieghere Differential Revision: https://reviews.llvm.org/D113604 Signed-off-by:
Luís Ferreira <contact@lsferreira.net>
-
Jon Chesterfield authored
-
Salman Javed authored
Fixes https://bugs.llvm.org/show_bug.cgi?id=48613. llvm-header-guard is suggesting header guards with leading underscores if the header file path begins with a '/' or similar special character. Only reserved identifiers should begin with an underscore. Differential Revision: https://reviews.llvm.org/D114149
-
Jeremy Morse authored
If we have a variable where its fragments are split into overlapping segments: DBG_VALUE $ax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 16) ... DBG_VALUE $eax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 32) we should only propagate the most recently assigned fragment out of a block. LiveDebugValues only deals with live-in variable locations, as overlaps within blocks is DbgEntityHistoryCalculators domain. InstrRefBasedLDV has kept the accumulateFragmentMap method from VarLocBasedLDV, we just need it to recognise DBG_INSTR_REFs. Once it's produced a mapping of variable / fragments to the overlapped variable / fragments, VLocTracker uses it to identify when a debug instruction needs to terminate the other parts it overlaps with. The test is updated for some standard "InstrRef picks different registers" variation, and the order of some unrelated DBG_VALUEs changes. Differential Revision: https://reviews.llvm.org/D114603
-