- Nov 22, 2024
-
-
Jenkins CI authored
-
- Nov 21, 2024
-
-
Roger Ferrer authored
-
Paul Walker authored
This brings the printing of scalable vector constant splats inline with their fixed length counterparts.
-
kadir çetinkaya authored
Starting with 41e3919d DiagnosticsEngine creation might perform IO. It was implicitly defaulting to getRealFileSystem. This patch makes it explicit by pushing the decision making to callers. It uses ambient VFS if one is available, and keeps using `getRealFileSystem` if there aren't any VFS.
-
Benjamin Maxwell authored
Previously, with `-fzero-call-used-regs` clang/LLVM would incorrectly emit Neon instructions in streaming functions, and streaming-compatible functions without SVE. With this change: * In streaming functions, Z/p registers will be zeroed * In streaming compatible functions w/o SVE, D registers will be zeroed - (As Neon vector instructions are illegal including `movi v..`)
-
Mikhail Goncharov authored
-
Simon Pilgrim authored
Fixes #116977
-
Simon Pilgrim authored
-
Oleksandr T. authored
Fixes #63009.
-
CarolineConcatto authored
…x8 and MFloat8x16 This patch adds MFloat8 as a TypeFlag and Kind on Neon to generate the typedefs using emitNeonTypeDefs. It does not need any change in Clang, because SEMA and CodeGen use the Builtins defined in AArch64SVEACLETypes.def
-
Florian Hahn authored
The improvements in 63917e19 / #70796 do not check for memory barriers/unmodelled sideeffects, which means we may incorrectly hoist loads across memory barriers. Fix this by checking any machine instruction in the loop is a load-fold barrier. PR: https://github.com/llvm/llvm-project/pull/116987
-
Oliver Stannard authored
This was implementing the bf16->float conversion function using a left-shift of a signed integer, so for negative floating-point values a 1 was being shifted into the sign bit of the signed integer intermediate value. This is undefined behaviour, and was caught by UBSan. The vector versions are code-generated via Neon builtin functions, so probably don't have the same UB problem, but I've updated them anyway to be consistent. Fixes #61983.
-
Roger Ferrer authored
This goes against common wisdom: live ranges are not to be extended. However, cross-bank instructions (such a vector-splat) may benefit from being able to do that, especially in systems where spilling one register class is more expensive than the other (in EPI spill code for vectors is much more expensive than spill code for the scalar + vector-splat). This change leverages the fact that in RISC-V RA happens in two steps: first vectors and then scalars (and insert vsetvli happens inbetween), so some of the rematerializations for vectors are still available. In the case of vector-splats, when they cannot be folded into the instructions (like in the testcase) they are typically hoisted out of the loop. This is efficient but can cause spill code (like in the testcase). So we end with a loop that in each iteration reloads a vector that is basically a splat. Computing a splat is much faster than loading a vector and is also faster than having to load the scalar before the splat (if the scalar got spilled too). But rematerialization does not happen if the live ranges are not available at the point of the rematerialization. And this is what happens with hoisted vector-splats, the scalar used in the splat may not be live in the loop body. So in this case we extend the live range of the scalar so it is live to the rematerialization point. We do that only in a very restricted set of cases and we have observed not a lot of disturbance to other RISC-V testcases. This is gated by a TargetInstrInfo hook which is only set to true in EPI.
-
Roger Ferrer authored
We forgot to update these tests in the last merge. Also adding some nounwind in tests that do not care about CFI directives.
-
-
Roger Ferrer authored
-
Jonathan Cohen authored
Currently the function will walk the entire DAG to find other candidates to perform a post-inc store. This leads to very long compilation times on large functions. Added a MaxSteps limit to avoid this, which is also aligned to how hasPredecessorHelper is used elsewhere in the code.
-
-
-
-
abhishek-kaushik22 authored
Closes #116767
-
Ami-zhang authored
Introduce a check for `__loongarch_frecipe` macro around the FP approximation intrinsic implementation. This ensures that these intrinsics are only included when this macro is defined, providing better flexibility and control over the usage of FP approximation instructions.
-
Andrzej Warzyński authored
Adds tests with scalable vectors for the Vector-To-LLVM conversion pass. Covers the following Ops: * `vector.maskedload`, * `vector.maskedstore`, * `vector.gather`, * `vector.scatter`. In addition: * For consistency with other tests, renamed test function names (e.g. `@masked_load_op` -> `@masked_load_op`) * Made some test names more descriptive, e.g `@gather_op_2d` -> `@gather_1d_from_2d`.
-
Yingwei Zheng authored
Counterexample: 219 is a multiple of 73. But `sext i8 219 to i16 = 65499` is not. Fixes https://github.com/llvm/llvm-project/issues/116483.
-
wanglei authored
This commit fixes an issue in the large code model where non-dso_local function calls did not use the GOT as expected in PIC mode. Instead, direct PC-relative access was incorrectly applied, leading to linker errors when building shared libraries. For `ExternalSymbol`, it is not possible to determine whether it is dso_local during pseudo-instruction expansion. We use target flags to differentiate whether GOT should be used. Reviewed By: heiher, SixWeining Pull Request: https://github.com/llvm/llvm-project/pull/117099
-
Boaz Brickner authored
[clang] [NFC] Remove SourceLocation() parameter from Diag.Report() calls in SourceManager, and use the equivalent Report() overload instead (#116937)
-
Jay Foad authored
Fix coding errors found by inspection and check that the swz bit still serves to prevent merging of buffer loads/stores on GFX12+.
-
Feng Zou authored
For mov name@GOTTPOFF(%rip), %reg add name@GOTTPOFF(%rip), %reg add `R_X86_64_CODE_4_GOTTPOFF` = 44 if the instruction starts at 4 bytes before the relocation offset. It's similar to R_X86_64_GOTTPOFF. Linker can treat `R_X86_64_CODE_4_GOTTPOFF` as `R_X86_64_GOTTPOFF` or convert the instructions above to mov $name@tpoff, %reg add $name@tpoff, %reg if the first byte of the instruction at the relocation `offset - 4` is `0xd5` (namely, encoded w/REX2 prefix) when possible. Binutils patch: https://github.com/bminor/binutils-gdb/commit/a533c8df598b5ef99c54a13e2b137c98b34b043c Binutils mailthread: https://sourceware.org/pipermail/binutils/2023-December/131463.html ABI discussion: https://groups.google.com/g/x86-64-abi/c/ACwD-UQXVDs/m/vrgTenKyFwAJ Blog: https://kanrobert.github.io/rfc/All-about-APX-relocation
-
abhishek-kaushik22 authored
-
Lee Wei authored
This PR removes tests with `br i1 undef` under `llvm/tests/Transforms/Loop*, Lower*`.
-
Mingming Liu authored
Currently, both [TypeIdMap](https://github.com/llvm/llvm-project/blob/67a1fdb014790a38a205d28e1748634de34471dd/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1356) and [TypeIdCompatibleVtableMap](https://github.com/llvm/llvm-project/blob/67a1fdb014790a38a205d28e1748634de34471dd/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1363) keep type-id as `std::string` in the combined index for LTO indexing analysis. With this change, index uses a unique-string-saver to own the string copies and two maps above can use string references to save some memory. This shows a 3% memory reduction (from 8.2GiB to 7.9GiB) in an internal binary with high indexing memory usage.
-
Yingwei Zheng authored
See the following case: ``` define i1 @test_logical_and_icmp_samesign(i8 %x) { %cmp1 = icmp ne i8 %x, 9 %cmp2 = icmp samesign ult i8 %x, 11 %and = select i1 %cmp1, i1 %cmp2, i1 false ret i1 %and } ``` Currently we cannot convert this logical and into a bitwise and due to the `samesign` flag. But if `%cmp2` evaluates to `poison`, we can infer that `%cmp1` is either `poison` or `true` (`samesign` violation indicates that X is negative). Therefore, `%and` still evaluates to `poison`. This patch converts a logical and into a bitwise and iff TV is poison implies that Cond is either poison or true. Likewise, we convert a logical or into a bitwise or iff FV is poison implies that Cond is either poison or false. Note: 1. This logic is implemented in InstCombine. Not sure whether it is profitable to move it into ValueTracking and call `impliesPoison(TV/FV, Sel)` instead. 2. We only handle the case that `ValAssumedPoison` is `icmp samesign pred X, C1` and `V` is `icmp pred X, C2`. There are no suitable variants for `isImpliedCondition` to pass the fact that X is [non-]negative. Alive2: https://alive2.llvm.org/ce/z/eorFfa Motivation: fix [a major regression](https://github.com/dtcxzyw/llvm-opt-benchmark/pull/1724#discussion_r1849663863) to unblock https://github.com/llvm/llvm-project/pull/112742.
-
Lang Hames authored
This tests that a simple C++ static initializer works as expected. Compared to the architecture specific, assembly level regression tests for the ORC runtime; this test is expected to catch cases where the compiler adopts some new MachO feature that the ORC runtime does not yet support (e.g. a new initializer section).
-
Timm Baeder authored
We need to check the ToType first, then the FromType. Additionally, remove qualifiers from the parent type of the field we're emitting a note for.
-
Craig Topper authored
-
donald chen authored
Disabling memrefs with a stride of 0 was intended to prevent internal aliasing, but this does not address all cases : internal aliasing can still occur when the stride is less than the shape. On the other hand, a stride of 0 can be very useful in certain scenarios. For example, in architectures that support multi-dimensional DMA, we can use memref::copy with a stride of 0 to achieve a broadcast effect. This commit removes the restriction that strides in memrefs cannot be 0.
-
Wu Yingcong authored
`NumCHRedBranches - 1` is used later, we should add an assertion to make sure it will not underflow.
-
Diego Caballero authored
When inferring the mask of a transfer operation that results in a single `i1` element, we could represent it using either `vector<i1>` or vector<1xi1>. To avoid type mismatches, this PR updates the mask inference logic to consistently generate `vector<1xi1>` for these cases. We can enable 0-D masks if they are needed in the future. See: https://github.com/llvm/llvm-project/issues/116197
-
Sushant Gokhale authored
Current cost-modelling does not take into account cost of materializing const vector. This results in some cases, as the test shows, being vectorized but this may not always be profitable. Future patch will try to address this issue.
-
Piyou Chen authored
Reverts llvm/llvm-project#115991 Due to build fail https://lab.llvm.org/buildbot/#/builders/66/builds/6511
-