- May 07, 2021
-
-
Roman Lebedev authored
-
Roman Lebedev authored
It measures as such, and the reference docs agree. I can't easily add a MCA test, because there's no mnemonic for it, it can only be disassembled or created as a MCInst.
-
Fangrui Song authored
Similar to X86 D73230 & 46788a21 With this change, we can set dso_local in clang's -fpic -fno-semantic-interposition mode, for default visibility external linkage non-ifunc-non-COMDAT definitions. For such dso_local definitions, variable access/taking the address of a function/calling a function will go through a local alias to avoid GOT/PLT. Note: the 'S' inline assembly constraint refers to an absolute symbolic address or a label reference (D46745). Differential Revision: https://reviews.llvm.org/D101872
-
Matt Morehouse authored
Fix function return type and remove check for SUMMARY, since it doesn't seem to be output in Windows.
-
Whitney Tsang authored
induction variable to be perfect This patch allow more conditional branches to be considered as loop guard, and so more loop nests can be considered perfect. Reviewed By: bmahjour, sidbav Differential Revision: https://reviews.llvm.org/D94717
-
Simon Pilgrim authored
Ensure we don't try to fold when one might be an opaque constant - the constant fold will fail and then the reverse fold will happen in DAGCombine.....
-
Roman Lebedev authored
Sometimes disassembler picks _REV variants of instructions over the plain ones, which in this case exposed an issue that the _REV variants aren't being modelled as optimizable moves.
-
Roman Lebedev authored
-
Sebastian Poeplau authored
Address sanitizer can detect stack exhaustion via its SEGV handler, which is executed on a separate stack using the sigaltstack mechanism. When libFuzzer is used with address sanitizer, it installs its own signal handlers which defer to those put in place by the sanitizer before performing additional actions. In the particular case of a stack overflow, the current setup fails because libFuzzer doesn't preserve the flag for executing the signal handler on a separate stack: when we run out of stack space, the operating system can't run the SEGV handler, so address sanitizer never reports the issue. See the included test for an example. This commit fixes the issue by making libFuzzer preserve the SA_ONSTACK flag when installing its signal handlers; the dedicated signal-handler stack set up by the sanitizer runtime appears to be large enough to support the additional frames from the fuzzer. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101824
-
thomasraoux authored
Differential Revision: https://reviews.llvm.org/D102034
-
thomasraoux authored
Differential Revision: https://reviews.llvm.org/D102041
-
Joseph Tremoulet authored
Pointers escape when converted to integers, so a pointer produced by converting an integer to a pointer must not be a local non-escaping object. Reviewed By: nikic, nlopes, aqjune Differential Revision: https://reviews.llvm.org/D101541
-
Sanjay Patel authored
This is a reduction of the example in: https://llvm.org/PR50256
-
Joseph Huber authored
Summary: The allocator interface added in D97883 allows the RTL to allocate shared and host-pinned memory from the cuda plugin. This patch adds support for these to the runtime. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D102000
-
Tobias Gysi authored
Remove the builder signature taking a signed dimension identifier. Reviewed By: ergawy Differential Revision: https://reviews.llvm.org/D102055
-
Tres Popp authored
This it to make more clear the difference between this and an AliasAnalysis. For example, given a sequence of subviews that create values A -> B -> C -> d: BufferViewFlowAnalysis::resolve(B) => {B, C, D} AliasAnalysis::resolve(B) => {A, B, C, D} Differential Revision: https://reviews.llvm.org/D100838
-
Ahsan Saghir authored
Vector pair intrinsics and builtins were renamed in https://reviews.llvm.org/D91974 to replace the _mma_ prefix by _vsx_. However, some projects used the _mma_ version, so this patch adds these intrinsics to provide compatibility. Fixes Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=50159 Reviewed By: nemanjai, amyk Differential Revision: https://reviews.llvm.org/D100482
-
Roman Lebedev authored
Drop it just enough so it still produces the right IPC.
-
Roman Lebedev authored
They are resolved at the register rename stage without using any execution units.
-
Roman Lebedev authored
I've verified this with llvm-exegesis. This is not limited to zero registers.
-
Roman Lebedev authored
I've verified this with llvm-exegesis. This is not limited to zero registers.
-
Roman Lebedev authored
I've verified this with llvm-exegesis. This is not limited to zero registers. Refs: AMD SOG 19h, 2.9.4 Zero Cycle Move The processor is able to execute certain register to register mov operations with zero cycle delay. Agner, 22.13 Instructions with no latency Register-to-register move instructions are resolved at the register rename stage without using any execution units. These instructions have zero latency. It is possible to do six such register renamings per clock cycle, and it is even possible to rename the same register multiple times in one clock cycle.
-
Roman Lebedev authored
-
Roman Lebedev authored
-
Roman Lebedev authored
-
Roman Lebedev authored
They are resolved at the register rename stage without using any execution units.
-
Roman Lebedev authored
-
Roman Lebedev authored
So the IPC actually stabilizes at 6.
-
Arthur O'Dwyer authored
And remove the dedicated debug-iterator tests; we want to test this in all modes. We have a CI step for testing the whole test suite with `--debug_level=1` now. Part of https://reviews.llvm.org/D102003
-
Arthur O'Dwyer authored
Part of https://reviews.llvm.org/D102003
-
Arthur O'Dwyer authored
And remove the dedicated debug-iterator test; we want to test this in all modes. We have a CI step for testing the whole test suite with `--debug_level=1` now. Part of https://reviews.llvm.org/D102003
-
Stephen Tozer authored
Reapply b623df3c, which was reverted while reverting a different patch with a breaking change. There are no underlying issues with this patch, so no changes have been made to the original patch. This reverts commit b11e4c99.
-
Simon Pilgrim authored
[CodeGen] Ensure UserValue::getDebugLoc() and UserLabel::getDebugLoc() consistently return a const reference NFCI. Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.
-
Simon Pilgrim authored
Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.
-
Benjamin Kramer authored
getSpillAlign does the same thing.
-
Sebastian Neubauer authored
gfx9 does not work with negative offsets, gfx10 works only with aligned negative offsets, but not with unaligned negative offsets. This is slightly more conservative than needed, gfx9 does support negative offsets when a VGPR address is used and gfx10 supports negative, unaligned offsets when an SGPR address is used, but we do not make use of that with this patch. Differential Revision: https://reviews.llvm.org/D101292
-
David Stuttard authored
Retrying after revert and fix (removed implicit def flag from operand). Now passes with expensive_checks enabled. Since there is a single scratch resource descriptor for all shaders, if there is a wave32 and a wave64 shader (for instance for VsFs pairs) then the const_index_stride will be incorrect for wave32 shaders. Differential Revision: https://reviews.llvm.org/D101830 Change-Id: Ie3b8b2921237968caca91527dd0c97b1b0cc0360
-
Stephen Tozer authored
This patch is a fix for revision ce0c1f3c, which caused test failures on bots without x86 as a registered target. This patch moves the test added in the prior patch to the x86 folder, so that it only runs on bots with the correct target available.
-
Malhar Jajoo authored
This patch converts llvm.memset intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE). The llvm.memset is converted to a TP loop for both constant and non-constant input sizes (of llvm.memset). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100435
-
Anastasia Stulova authored
This change allows the use of identifiers for image types from `cl_khr_gl_msaa_sharing` freely in the kernel code if the extension is not supported since they are not in the list of the reserved identifiers. This change also removed the need for pragma for the types in the extensions since the spec does not require the pragma uses. Differential Revision: https://reviews.llvm.org/D100983
-