- Feb 08, 2023
-
-
Vladislav Dzhidzhoev authored
Reimplemented SelectionDAG code for GlobalISel. Fixes https://github.com/llvm/llvm-project/issues/54079 Differential Revision: https://reviews.llvm.org/D130903
-
Simon Pilgrim authored
Use APInt::setBit() method instead of OR'ing individual bits.
-
Brian Cain authored
Patch-by:
Colin Lemahieu <colinl@codeaurora.org> Differential Revision: https://reviews.llvm.org/D143531
-
Valentin Clement authored
Result must carry the polymorphic type information from the vector. Reviewed By: jeanPerier Differential Revision: https://reviews.llvm.org/D143575
-
Zain Jaffal authored
This reverts commit 665ee0cd. Fix comments and formatting style.
-
Joseph Huber authored
Summary: We don't have the infastructure to support MPFR on the GPU. We should disable this categorically on GPU builds for now.
-
Guillaume Chatelet authored
-
Sanjay Patel authored
-
Florian Hahn authored
This adds test coverage to avoid crashes with further changes.
-
David Green authored
A combination of GlobalISel and MachineCombiner can end up creating `SUB xrz, (MOVI -2105098)` instructions which have not been constant folded. The AArch64MIPeepholeOpt pass will then attempt to create `ADD xzr, 513, lsl 12`, which is not a valid instruction. This adds a bail out of the transform if the register is xzr/wzr. Fixes #60528 Differential Revision: https://reviews.llvm.org/D143475
-
JackAKirk authored
I used https://github.com/zjin-lcf/HeCBench (with nvcc usage swapped to clang++), which is an adaptation of the classic Rodinia benchmarks aimed at CUDA and SYCL programming models, to compare different values of the multiplier using both clang++ cuda and clang++ sycl nvptx backends. I find that the value is currently too low for both cases. Qualitatively (and in most cases there is very a close quantitative agreement across both cases) the change in code execution time for a range of values from 5 to 1000 matches in both variations (CUDA clang++ vs SYCL (with cuda backend) using the intel/llvm clang++ compiler) of the HeCbench samples. This value of 11 is optimal for clang++ cuda for all cases I've investigated. I have not found a single case where performance is deprecated by this change of the value from 5 to 11. For one sample the sycl cuda backend preferred a higher value. However we are happy to prioritize clang++ cuda, and we find that this value is close to ideal for both cases anyway. It would be good to do some further investigation using clang++ openmp cuda offload. However since I do not know of an appropriate set of benchmarks for this case, and the fact that we are now getting complaints about register spills related to insufficient inlining on a weekly basis, we have decided to propose this change and potentially seek some more input from someone who may have more expertise in the openmp case. Incidentally this value coincides with the value used for the amd-gcn backend. We have also been able to use the amd backend of the intel/llvm "dpc++" compiler to compare the inlining behaviour of an identical code when targetting amd (compared to nvptx). Unsurprisingly the amd backend with a multiplier value of 11 was performing better (with regard to inlining) than the nvptx case when the value of 5 was used. When the two backends use the same multiplier value the inlining behaviors appear to align closely. This also considerably improves the performance of at least one of the most popular HPC applications: NWCHEMX. Signed-off-by:
JackAKirk <jack.kirk@codeplay.com> Reviewed by: tra Differential Revision: https://reviews.llvm.org/D142232
-
Marco Elver authored
Emit all constant integers produced by SanitizerBinaryMetadata as ULEB128 to further reduce binary space used. Increasing the version is not necessary given this change depends on (and will land) along with the bump to v2. To support this, the !pcsections metadata format is extended to allow for per-section options, encoded in the first MD operator which must always be a string and contain the section: "<section>!<options>". Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D143484
-
Marco Elver authored
Optimize the encoding of "covered" metadata by: 1. Reducing feature mask from 4 bytes to 1 byte (needs increase once we reach more than 8 features). 2. Only emitting UAR stack args size if it is non-zero, saving 4 bytes in the common case. One caveat is that the emitted metadata for function PC (offset), size, and UAR size (if enabled) are no longer aligned to 4 bytes. SanitizerBinaryMetadata version base is increased to 2, since the change is backwards incompatible. Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D143482
-
Benjamin Kramer authored
Fixes a81136c3
-
Benjamin Kramer authored
-
Benjamin Kramer authored
This is quite silly, but casting to uintptr_t seems like the easiest option to quiet ubsan. llvm/lib/Support/xxhash.cpp:107:12: runtime error: applying non-zero offset 8 to null pointer #0 0x7fe3660404c0 in llvm::xxHash64(llvm::StringRef) llvm/lib/Support/xxhash.cpp:107:12
-
Jean Perier authored
Code move without any change, the goal is to re-use this piece of code for procedure designator lowering in HLFIR since there is no significant changes in the way procedure designators will be lowered. Differential Revision: https://reviews.llvm.org/D143563
-
David Green authored
So long as the operation is reassociative, we can reassociate the double vecreduce from for example fadd(vecreduce(a), vecreduce(b)) to vecreduce(fadd(a,b)). This will in general save a few instructions, but some architectures (MVE) require the opposite fold, so a shouldExpandReduction is added to account for it. Only targets that use shouldExpandReduction will be affected. Differential Revision: https://reviews.llvm.org/D141870
-
Zain Jaffal authored
This reverts commit 40ffe9c1. Reverted because some comments where missed in the review https://reviews.llvm.org/D142647
-
Christian Ulmann authored
This commit adds additional checks and warning messages to the MD_prof import. As LLVM does not verify most metadata, the import has the be resilient towards ill-formatted inputs. Reviewed By: gysit Differential Revision: https://reviews.llvm.org/D143492
-
Zain Jaffal authored
Differential Revision: https://reviews.llvm.org/D142647
-
Christian Ulmann authored
This commit introduces functionality to import loop metadata. Loop metadata nodes are transformed into LoopAnnotationAttrs and attached to the corresponding branch operations. Reviewed By: gysit Differential Revision: https://reviews.llvm.org/D143376
-
Tom Eccles authored
This test has been very unreliable across different machines. Update it to use clang's sysroot image so that the fastmath object file name is stable across different distributions and distro types. Based on clang/test/Driver/linux-ld.c Thanks to mnadeem for pointing this out at https://reviews.llvm.org/D138675 Differential Revision: https://reviews.llvm.org/D142807
-
Tom Eccles authored
These will be useful for sharing code with intrinsic argument processing when lowering hlfir transformational intrinsic operations to FIR in the BufferizeHLFIR pass. Differential Revision: https://reviews.llvm.org/D143503
-
wangpc authored
The default OpenMP runtime may not be libomp since it can be changed by specified `CLANG_DEFAULT_OPENMP_RUNTIME`. This test will fail if we change the default OpenMP runtime. This patch removes test for the default OpenMP runtime and moves the CHECKs downward. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D143549
-
Simon Pilgrim authored
If IMINMAX ops aren't legal, we can lower to the select(icmp(x,y),sub(x,y),sub(y,x)) pattern
-
Valentin Clement authored
Creation of polymorphic array temporary cannot be done inlined. Add a TODO so the current code exit in a clean way when lowering reach it. A solution involving the runtime will be put in place. Depends on D143490 Reviewed By: jeanPerier Differential Revision: https://reviews.llvm.org/D143491
-
Valentin Clement authored
fir.class type is always needed for polymorphic and unlimited polymorphic entities. Wrapping the element type with a fir.class type was done in ConvertType for some case and else where in the code for other. Centralize this in ConvertType when converting from expr or symbol. Reviewed By: jeanPerier Differential Revision: https://reviews.llvm.org/D143490
-
Markus Böck authored
This is the first patch in a series of patches part of this RFC: https://discourse.llvm.org/t/rfc-switching-the-llvm-dialect-and-dialect-lowerings-to-opaque-pointers/68179 This patch adds the ability to lower the memref dialect to the LLVM Dialect with the use of opaque pointers instead of typed pointers. The latter are being phased out of LLVM and this patch is part of an effort to phase them out of MLIR as well. To do this, we'll need to support both typed and opaque pointers in lowering passes, to allow downstream projects to change without breakage. The gist of changes required to change a conversion pass are: * Change any `LLVM::LLVMPointerType::get` calls to NOT use an element type if opaque pointers are to be used. * Use the `build` method of `llvm.load` with the explicit result type. Since the pointer does not have an element type anymore it has to be specified explicitly. * Use the `build` method of `llvm.getelementptr` with the explicit `basePtrType`. Ditto to above, we have to now specify what the element type is so that GEP can do its indexing calculations * Use the `build` method of `llvm.alloca` with the explicit `elementType`. Ditto to the above, alloca needs to know how many bytes to allocate through the element type. * Get rid of any `llvm.bitcast`s * Adapt the tests to the above. Note that `llvm.store` changes syntax as well when using opaque pointers I'd like to note that the 3 `build` method changes work for both opaque and typed pointers, so unconditionally using the explicit element type form is always correct. For the testsuite a practical approach suggested by @ftynse was taken: I created a separate test file for testing the typed pointer lowering of Ops. This mostly comes down to checking that bitcasts have been created at the appropiate places, since these are required for typed pointer support. Differential Revision: https://reviews.llvm.org/D143268
-
Tobias Hieta authored
When building a PGO version of LLVM you might want to customize the output profile file when building tests. For this to work we need to pass LLVM_PROFILE_FILE enviroment. Reviewed By: abrachet Differential Revision: https://reviews.llvm.org/D143556
-
Fangrui Song authored
If a COMDAT key has a local linkage, it behaves as `comdat nodeduplicate` and llvm/lib/Linker/LinkModules.cpp does not deduplicate its members. This is not intended. Switch to an external linkage to allow deduplication. See also https://maskray.me/blog/2021-07-25-comdat-and-section-group#grp_comdat Reviewed By: melver Differential Revision: https://reviews.llvm.org/D143530
-
gonglingqin authored
There are 12bit offset fields in the ld.[b/h/w/d] and st.[b/h/w/d]. When the constant address is less than 12 bits, the address calculation is incorporated into the offset field of the instruction. Differential Revision: https://reviews.llvm.org/D143470
-
Chuanqi Xu authored
[C++20] [Modules] Allow -fmodule-file=<module-name>=<BMI-Path> for implementation unit and document the behavior Close https://github.com/llvm/llvm-project/issues/57293. Previsouly we can't use `-fmodule-file=<module-name>=<BMI-Path>` for implementation units, it is a bug. Also the behavior of the above option is not tested nor documented for C++20 Modules. This patch addresses the 2 problems.
-
Matthias Springer authored
This is in preparation of reusing the same AnalysisState for tensor.empty elimination and One-Shot Bufferize (to address performance bottlenecks). Differential Revision: https://reviews.llvm.org/D143379
-
Valentin Clement authored
Derived type from the MOLD was not carried over to the newly allocated pointer or allocatable. This may lead to wrong dynamic type when the pointer or allocatable is polymorphic as shown in the example below: ``` type :: p1 integer :: a end type type, extends(p1) :: p2 integer :: b end type class(p1), pointer :: p(:) allocate(p(5), MOLD=p2(1,2)) ``` Reviewed By: klausler Differential Revision: https://reviews.llvm.org/D143525
-
Matthias Springer authored
There is no longer a need to keep the two separate. This is in preparation of reusing the same AnalysisState for tensor.empty elimination and One-Shot Bufferize (to address performance bottlenecks). Differential Revision: https://reviews.llvm.org/D143313
-
Tobias Hieta authored
If you are missing the iOS SDK on your macOS (for example you don't have full Xcode but just CommandLineTools) then CMake currently errors out without a helpful message. This patch disables iOS support in compiler-rt if the iOS SDK is not found. This can be overriden by passing -DCOMPILER_RT_ENABLE_IOS=ON. Reviewed By: delcypher, thetruestblue Differential Revision: https://reviews.llvm.org/D133273
-
Philipp Tomsich authored
This reverts commit 3304d51b.
-
Philipp Tomsich authored
This reverts commit 656188dd.
-
Philipp Tomsich authored
This reverts commit 19a59099.
-