- Oct 21, 2021
-
-
Craig Topper authored
Differential Revision: https://reviews.llvm.org/D112233
-
Arthur Eubanks authored
This reverts commit baea663a. Causes crashes, e.g. https://lab.llvm.org/buildbot/#/builders/77/builds/10715.
-
Aaron Ballman authored
-
Petr Hosek authored
This reverts commit 0eed292f, there are compiler-rt build failures that appear to have been introduced by this change.
-
Florian Hahn authored
-
River Riddle authored
This effectively mirrors the logging in dialect conversion, which has proven very useful for understanding the pattern application process. Differential Revision: https://reviews.llvm.org/D112120
-
River Riddle authored
Move a few methods out of line and clean up comments.
-
Ahmed Taei authored
Otherwise this can result a poison value on some platforms see https://bugs.llvm.org/show_bug.cgi?id=51204 Reviewed By: ezhulenev Differential Revision: https://reviews.llvm.org/D112115
-
Ben Langmuir authored
When cross-compiling, these tests will fail. For now leave the host arch check that was already there since I don't know why it was added.
-
Valentin Clement authored
This patch is extracted from D111337. It introduce the CharacterExprHelper that helps dealing with character in FIR. Reviewed By: schweitz, awarzynski Differential Revision: https://reviews.llvm.org/D112140 Co-authored-by:
Jean Perier <jperier@nvidia.com> Co-authored-by:
Eric Schweitz <eschweitz@nvidia.com> Co-authored-by:
V Donaldson <vdonaldson@nvidia.com>
-
Sanjay Patel authored
shuf (bo X, Y), (bo X, W) --> bo (shuf X), (shuf Y, W) This is motivated by an example in D111800 (although that patch avoids the problem for that particular example). The pattern is shown in reduced form with: https://llvm.org/PR52178 https://alive2.llvm.org/ce/z/d8zB4D There is no difference on the PhaseOrdering test from D111800 because the aarch64 cost model says that the shuffle cost is 3 while the fadd cost is 2. Differential Revision: https://reviews.llvm.org/D111901
-
Arthur Eubanks authored
This clears the memory used for the Clang AST before we run LLVM passes. https://llvm-compile-time-tracker.com/compare.php?from=d0a5f61c4f6fccec87fd5207e3fcd9502dd59854&to=b7437fee79e04464dd968e1a29185495f3590481&stat=max-rss shows significant memory savings with no slowdown (in fact -O0 slightly speeds up). For more background, see https://lists.llvm.org/pipermail/cfe-dev/2021-September/068930.html. Turn this off for the interpreter since it does codegen multiple times. Relanding with fix for -print-stats: D111973 Relanding with fix for plugins: D112190 If you'd like to use this even with plugins, consider using the features introduced in D112096. This can be turned off with -Xclang -no-clear-ast-before-backend. Differential Revision: https://reviews.llvm.org/D111270
-
Fraser Cormack authored
This test case, reduced from an internal test failure, shows how we may incorrectly skip the insertion of VSETVLI instructions when doing cross-basic-block analysis. The entry block ends in a `e32,mf2`. Its single successor, %bb.1, ends with a `e8,mf8`, but for a mask-type instruction, so is considered compatible. This means that the info %bb.1 is merged into its predecessor so produces a `e32,mf2`. When it comes to the last block, which requires a `e32,mf2`, we skip the insertion of a vsetvli because all predecessors were determined to preserve the right vtype. However, when %bb.1 is actually laid out it does actually need a `e8,mf8` vsetvli, since the previous instruction has a different tail policy. This means that when execution flows from %bb.1 to %bb.3, the `vadd.vx` is misconfigured. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D112223
-
Philip Reames authored
This change restructures the cache used in IPT to point not to the first special instruction, but to the first instruction which *could* be special. That is, the cached reference is always equal to the first special, or comes before it in the block. This avoids expensive block scans when we are removing special instructions from the beginning of the block. At the moment, this case is not heavily used, though it does trigger in GVN when doing CSE of calls. The main motivation was a change I'm no longer planning to move forward with, but the cache optimization seemed worthwhile as a minor perf win at low cost. Differential Revision: https://reviews.llvm.org/D111768
-
Aaron Ballman authored
-
Aaron Ballman authored
-
Arthur Eubanks authored
Downstream users may have Clang plugins. By default these plugins run after the main action if they are specified on the command line. Since these plugins are ASTConsumers, presumably they inspect the AST. So we shouldn't clear it if any plugins run after the main action. Reviewed By: dblaikie, hans Differential Revision: https://reviews.llvm.org/D112190
-
Ben Langmuir authored
Reapply 5692ed0c, but with the ORC runtime disabled explicitly on CrossWinToARMLinux to match the other compiler-rt runtime libraries. Differential Revision: https://reviews.llvm.org/D112229 --- Enable building the ORC runtime for 64-bit and 32-bit ARM architectures, and for all Darwin embedded platforms (iOS, tvOS, and watchOS). This covers building the cross-platform code, but does not add TLV runtime support for the new architectures, which can be added independently. Incidentally, stop building the Mach-O TLS support file unnecessarily on other platforms. Differential Revision: https://reviews.llvm.org/D112111
-
Kazu Hirata authored
-
Yonghong Song authored
Clang patch ([1]) added support for btf_decl_tag attributes with typedef types. This patch added llvm support including dwarf generation. For example, for typedef typedef unsigned * __u __attribute__((btf_decl_tag("tag1"))); __u u; the following shows llvm-dwarfdump result: 0x00000033: DW_TAG_typedef DW_AT_type (0x00000048 "unsigned int *") DW_AT_name ("__u") DW_AT_decl_file ("/home/yhs/work/tests/llvm/btf_tag/t.c") DW_AT_decl_line (1) 0x0000003e: DW_TAG_LLVM_annotation DW_AT_name ("btf_decl_tag") DW_AT_const_value ("tag1") 0x00000047: NULL [1] https://reviews.llvm.org/D110127 Differential Revision: https://reviews.llvm.org/D110129
-
Yonghong Song authored
Previously, btf_del_tag attribute supports record, field, global variable, function and function parameter ([1], [2]). This patch added support for typedef. The main reason is for typedef of an anonymous struct/union, we can only apply btf_decl_tag attribute to the anonymous struct/union like below: typedef struct { ... } __btf_decl_tag target_type In this case, the __btf_decl_tag attribute applies to anonymous struct, which increases downstream implementation complexity. But if typedef with btf_decl_tag attribute is supported, we can have typedef struct { ... } target_type __btf_decl_tag which applies __btf_decl_tag to typedef "target_type" which make it easier to directly associate btf_decl_tag with a named type. This patch permitted btf_decl_tag with typedef types with this reason. [1] https://reviews.llvm.org/D106614 [2] https://reviews.llvm.org/D111588 Differential Revision: https://reviews.llvm.org/D110127
-
Mark de Wever authored
This addresses the usage of `operator&` in `<vector>`. I now added tests for the current offending cases. I wonder whether it would be better to add one addressof test per directory and test all possible violations. Also to guard against possible future errors? (Note there are still more headers with the same issue.) Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D111961
-
Jez Ng authored
While attempting to simplify it, I discovered a concerning discrepancy between our handling of LC_LINKER_OPTION vs ld64's. In particular, ld64 does not appear to check for `-all_load` nor `-ObjC` when processing those options. Thus, if/when we fix this behavior, no duplicate symbol error will be expected regardless of the use-after-free. As such, I've removed the test logic that tries to induce the duplicate symbol error. We can just rely on ASAN to do the verification. In order to make the test run on Windows, I've removed the symlink logic. Both ld64 and LLD handle this un-symlinked framework just fine. I also capitalized the framework name, since that's the typical convention. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D112195
-
Lang Hames authored
These were accidentally picked up in an earlier commit.
-
Nicolas Vasilache authored
In the stride == 1 case, conv1d reads contiguous data along the input dimension. This can be advantageaously used to bulk memory transfers and compute while avoiding unrolling. Experimentally, this can yield speedups of up to 50%. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D112139
-
Jon Chesterfield authored
Step towards building the DeviceRTL for amdgpu. Mostly replaces cuda-specific toolchain finding logic with the generic logic currently found in the amdgpu deviceRTL cmake. Also deletes dead code and changes the default to build on systems without cuda installed, as the library doesn't use cuda and the amdgpu-only systems generally won't have cuda installed. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111983
-
Sanjay Patel authored
This updates the recent D112108 / b92412fb to handle the flipped logic ('or') sibling: https://alive2.llvm.org/ce/z/Y2L6Ch
-
Sanjay Patel authored
These are direct mutations of the tests added for D112108 - we should handle the sibling folds for 'or'.
-
Kirill Bobyrev authored
Context: https://reviews.llvm.org/D110925#inline-1070046
-
Louis Dionne authored
In 395271ad, I simplified how we handled the target triple for the runtimes. However, in doing so, we stopped considering the default in CMAKE_CXX_COMPILER_TARGET, so we'd use the LLVM_DEFAULT_TARGET_TRIPLE (which is the host triple) even if CMAKE_CXX_COMPILER_TARGET was specified. This commit fixes that problem and also refactors the code so that it's easy to see what the default value is. The fact that nobody seems to have been broken by this makes me think that perhaps nobody is using CMAKE_CXX_COMPILER_TARGET to specify the triple -- but it should still work. Differential Revision: https://reviews.llvm.org/D111672
-
Anirudh Prasad authored
- This patch provides the initial implementation for lowering a call on z/OS according to the XPLINK64 calling convention - A series of changes have been made to SystemZCallingConv.td to account for these additional XPLINK64 changes including adding a new helper function to shadow the stack along with allocation of a register wherever appropriate - For the cases of copying a f64 to a gr64 and a f128 / 128-bit vector type to a gr64, a `CCBitConvertToType` has been added and has been bitcasted appropriately in the lowering phase - Support for the ADA register (R5) will be provided in a later patch. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D111662
-
Sanjay Patel authored
(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128 I haven't found a generalization of this identity: https://alive2.llvm.org/ce/z/_sriEQ Note: I was actually looking at the first form of the pattern in that link, but that's part of a long chain of potential missed transforms in codegen and IR....that I hope ends here! The predicates for when this is profitable are a bit tricky. This version of the patch excludes multi-use but includes custom lowering (as opposed to legal only). On x86 for example, we have custom lowering for some vector types, and that uses umax and sub. So to enable that fold, we need add use checks to avoid regressions. Even with legal-only lowering, we could see code with extra reg move instructions for extra uses, so that constraint would have to be eased very carefully to avoid penalties. Differential Revision: https://reviews.llvm.org/D112085
-
Anirudh Prasad authored
- There are certain instructions most notably those with extended mnemonics that restricted to only the gnu/att variant - There are also certain instruction aliases/mnemonic aliases that are restricted only to the HLASM variant (see https://reviews.llvm.org/D97581, https://reviews.llvm.org/D94250 and https://reviews.llvm.org/D92185 for reference) - This patch adds a few tests to check for the behaviour introduced in the above patches. The testing coverage could not be added in at the same time, due to parallel work being done introducing the HLASM syntax Reviewed By: uweigand, abhina.sreeskantharajan Differential Revision: https://reviews.llvm.org/D112172
-
Alexey Bataev authored
Vectorization of PHIs and stores very similar, it might be beneficial to try to revectorize stores (like PHIs) if the total number of stores with the same/alternate opcode is less than the vector size but number of stores with the same type is larger than the vector size. Differential Revision: https://reviews.llvm.org/D109831
-
Matthias Springer authored
This is the same fix as for scf.for. Differential Revision: https://reviews.llvm.org/D112218
-
Matthias Springer authored
Differential Revision: https://reviews.llvm.org/D112123
-
Matthias Springer authored
Differential Revision: https://reviews.llvm.org/D111956
-
Matthias Springer authored
An InitTensorOp is replaced with an ExtractSliceOp on the InsertSliceOp's destination. This optimization is applied after analysis and only to InsertSliceOps that were decided to bufferize inplace. Another analysis on the new ExtractSliceOp is needed after the rewrite. Differential Revision: https://reviews.llvm.org/D111955
-
Jon Chesterfield authored
Fixes a compiler assert on passing a compile time integer to atomic builtins. Assert introduced in D61522 Function changed from ->bool to ->Optional in D76646 Simplifies call sites to getIntegerConstantExpr to elide the now-redundant isValueDependent checks. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D112159
-
Jan Svoboda authored
The `clang-scan-deps` CLI tool invokes the compiler with `-print-resource-dir` in case the `-resource-dir` argument is missing from the compilation command line. This is to enable running the tool on compilation databases that use compiler from a different toolchain than `clang-scan-deps` itself. While this doesn't make sense when scanning modular builds (due to the `-cc1` arguments the tool generates), the tool can can be used to efficiently scan for file dependencies of non-modular builds too. This patch stops deducing the resource directory by invoking the compiler by default. This mode can still be enabled by invoking `clang-scan-deps` with `--resource-dir-recipe invoke-compiler`. The new default is `--resource-dir-recipe modify-compiler-path` which relies on the resource directory deduction taking place in `Driver::Driver` which is based on the compiler path. This makes the default more aligned with the intended usage of the tool while still allowing it to serve other use-cases. Note that this functionality was also influenced by D108979, where the dependency scanner stopped going through `ClangTool::run`. The function tried to deduce the resource directory based on the current executable path, which might not be what the users expect when invoked from within a shared library. Depends on D108979. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D108366
-