- May 12, 2021
-
-
Simon Pilgrim authored
Pull out repeated code to create a concat_vectors of the same operand from all subvecs.
-
Simon Pilgrim authored
AVX1 could perform this as a v8f32 shuffle instead of splitting - based off PR46621
-
Fraser Cormack authored
This patch extends the vector type-conversion and legalization capabilities of scalable vector types. Firstly, `vscale x 1` types now behave more like the corresponding `vscale x 2+` types. This enables the integer promotion legalization of extended scalable types, such as the promotion of `<vscale x 1 x i5>` to `<vscale x 1 x i8>`. These `vscale x 1` types are also now better handled by `getVectorTypeBreakdown`, where what looks like older handling for 1-element fixed-length vector types was spuriously updated to include scalable types. Widening of scalable types is now better supported, by using `INSERT_SUBVECTOR` to insert the smaller scalable vector "value" type into the wider scalable vector "part" type. This allows AArch64 to pass and return `vscale x 1` types by value by widening. There are still cases where we are unable to legalize `vscale x 1` types, such as where expansion would require splitting the vector in two. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D102073
-
Valentin Clement authored
Add a conversion pass to convert higher-level type before translation. This conversion extract meangingful information and pack it into a struct that the translation (D101504) will be able to understand. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D102170
-
Anastasia Stulova authored
This removed the pointless need for extension pragma since it doesn't disable anything properly and it doesn't need to enable anything that is not possible to disable. The change doesn't break existing kernels since it allows to compile more cases i.e. without pragma statements but the pragma continues to be accepted. Differential Revision: https://reviews.llvm.org/D100985
-
Jordan Rupprecht authored
Much like other LLVM binary utilities, `llvm-cov` has a symlink compatibility feature where it runs in `gcov` compatibility mode if the binary name ends in `gcov`. This is identical to invoking `llvm-cov gcov ...`. Differential Revision: https://reviews.llvm.org/D102299
-
Yaxun (Sam) Liu authored
Currently clang does not emit device template variables instantiated only in host functions, however, nvcc is able to do that: https://godbolt.org/z/fneEfferY This patch fixes this issue by refactoring and extending the existing mechanism for emitting static device var ODR-used by host only. Basically clang records device variables ODR-used by host code and force them to be emitted in device compilation. The existing mechanism makes sure these device variables ODR-used by host code are added to llvm.compiler-used, therefore they are guaranteed not to be deleted. It also fixes non-ODR-use of static device variable by host code causing static device variable to be emitted and registered, which should not. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D102237
-
Craig Topper authored
[ValueTypes] Rename MVT::getVectorNumElements() to MVT::getVectorMinNumElements(). Fix some misuses of getVectorNumElements() getVectorNumElements() returns a value for scalable vectors without any warning so it is effectively getVectorMinNumElements(). By renaming it and making getVectorNumElements() forward to it, we can insert a check for scalable vectors into getVectorNumElements() similar to EVT. I didn't do that in this patch because there are still more fixes needed, but I was able to temporarily do it and passed the RISCV lit tests with these changes. The changes to isPow2VectorType and getPow2VectorType are copied from EVT. The change to TypeInfer::EnforceSameNumElts reduces the size of AArch64's isel table. We're now considering SameNumElts to require the scalable property to match which removes some unneeded type checks. This was motivated by the bug I fixed yesterday in 80b95108 Reviewed By: frasercrmck, sdesmalen Differential Revision: https://reviews.llvm.org/D102262
-
Stefan Pintilie authored
Revert "[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics" This reverts commit 6c80361b. Breaks PowerPC Big Endian buildbots.
-
Hendrik Greving authored
Fixes a bug in the DAG combiner that eliminates the stores because it missed to inspect the address space of the pointers. %v = load %ptr_as1 // no chain side effect store %v, %ptr_as2 As well as store %v, %ptr_as1 store %v, %ptr_as2 Fixes a test for above in X86. Differential Revision: https://reviews.llvm.org/D102096
-
Hendrik Greving authored
Adds a test in X86, exposing a bug in DAG combine eliminating stores that are the same value but no the same address space. Differential Revision: https://reviews.llvm.org/D102243
-
Peter Waller authored
When a ptest is used to set flags from the output of rdffr, the ptest can be eliminated, using a flags-setting rdffrs instead. Additionally, check that nothing consumes flags between rdffr and ptest; this case appears to have been missed previously. * There is no unpredicated RDFFRS instruction. * If substituting RDFFR_PP, require that the mask argument of the PTEST matches that of the RDFFR_PP. * Move some precondition code up inside optimizePTestInstr, so that it covers the new code paths for RDFFR which return earlier. * Only consider RDFFR, PTEST in same basic block. * Check for other flag setting instructions between the two, abort if found. * Drop an old TODO comment about removing dead PTEST instructions. RDFFR_P to follow in later patch. Differential Revision: https://reviews.llvm.org/D101357
-
Ben Shi authored
Reviewed By: dylanmckay Differential Revision: https://reviews.llvm.org/D100701
-
David Sherwood authored
I've changed a test in each of these files: Transforms/InstCombine/vec_demanded_elts.ll Transforms/InstCombine/vec_demanded_elts-inseltpoison.ll to use a variable GEP index instead of a constant value so that we're testing the more general case.
-
Martin Storsjö authored
The bug (PR50227, affecting COFF) that caused the revert in 6f5670a4 has been fixed in 382c505d now, so it should be safe to reenable the pass for that target (and ELF). In PR50227 it's also mentioned that the same pass seems to cause problems on aarch64 on darwin, so leaving it disabled there for now.
-
Greg McGary authored
`__mh_(execute|dylib|dylinker|bundle|preload|object)_header` are special symbols whose values hold the VMA of the Mach header to support introspection. They are attached to the first section in `__TEXT`, even though their addresses are outside `__TEXT`, and they do not refer to code. It is normally harmless, but when the first section of `__TEXT` has no other symbols, `__mh_*_header` is considered by the disassembler when determing function boundaries. Since `__mh_*_header` refers to an address outside `__TEXT`, the boundary determination fails and disassembly quits. Since `__TEXT,__text` normally has symbols, this bug is obscured. Experiments placing `__stubs` and `__stub_helper` first exposed the bug, since neither has symbols. Differential Revision: https://reviews.llvm.org/D101786
-
Julien Pagès authored
Improve the code generation of build_vector. Use the v_pack_b32_f16 instruction instead of v_and_b32 + v_lshl_or_b32 Differential Revision: https://reviews.llvm.org/D98081 Patch by Julien Pagès!
-
Roman Lebedev authored
We can not rely on (C+X)-->(X+C) already happening, because we might not have visited that `add` yet. The added testcase would get stuck in an endless combine loop.
-
Jay Foad authored
MachineRegisterInfo caches the reserved register set that is computed by by TargetRegisterInfo::getReservedRegs, so call into MRI to get the reserved regs to avoid recomputing them. In particular this speeds up AMDGPU's SIFormMemoryClauses pass because AMDGPU has a particularly complicated reserved set that is expensive to compute. Differential Revision: https://reviews.llvm.org/D102318
-
Tobias Gysi authored
after introducing the IndexedGenericOp to GenericOp canonicalization (https://reviews.llvm.org/D101612). Differential Revision: https://reviews.llvm.org/D102245
-
Piotr Sobczak authored
Remove assert introduced in D101177, following post-commit feedback.
-
Sanjay Patel authored
This is motivated by the example in https://llvm.org/PR50055 , but it doesn't do anything for that bug currently because we don't actually have a zero-extended setcc there. Proof for the generic transform (inverse of what we would try to do in combining): https://alive2.llvm.org/ce/z/aBL-Mg Differential Revision: https://reviews.llvm.org/D102275
-
Sanjay Patel authored
-
Nathan James authored
There should be a follow up to this for changing the traversal mode, but some of the tests don't like that. Reviewed By: steveire Differential Revision: https://reviews.llvm.org/D101614
-
Tobias Gysi authored
after introducing the IndexedGenericOp to GenericOp canonicalization (https://reviews.llvm.org/D101612). Differential Revision: https://reviews.llvm.org/D102308
-
David Spickett authored
This reverts commit b1a77e46. Which has a failing test on our armv7 bots: https://lab.llvm.org/buildbot/#/builders/59/builds/1812
-
Hana Joo authored
The `IgnoreArray` flag was not used before while running the rule. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=47288 | b/47288 ]] Reviewed By: njames93 Differential Revision: https://reviews.llvm.org/D101239
-
Tobias Gysi authored
after introducing the IndexedGenericOp to GenericOp canonicalization (https://reviews.llvm.org/D101612). Differential Revision: https://reviews.llvm.org/D102236
-
Kristina Bessonova authored
Differential Revision: https://reviews.llvm.org/D102195
-
Simon Pilgrim authored
Extend the HOP(HOP(X,Y),HOP(Z,W)) and SHUFFLE(HOP(X,Y),HOP(Z,W)) folds to handle repeating 256/512-bit vector cases. This allows us to drop the UNPACK(HOP(),HOP()) custom fold in combineTargetShuffle. This required isRepeatedTargetShuffleMask to be tweaked to support target shuffle masks taking more than 2 inputs.
-
gbreynoo authored
The readelf command guide shows the short options used as aliases but these are not found in the help text unless --show-hidden is used, other tools show aliases with --help. This change fixes the help output to be consistent with the command guide. Differential Revision: https://reviews.llvm.org/D102173
-
gbreynoo authored
In the help output of other tools and in the symbolizer command guide, Mach-O specific options are in their own section. This change fixes the symbolizer help output to be consistent. Differential Revision: https://reviews.llvm.org/D102178
-
David Sherwood authored
In InnerLoopVectorizer::widenPHIInstruction there are cases where we have to scalarise a pointer induction variable after vectorisation. For scalable vectors we already deal with the case where the pointer induction variable is uniform, but we currently crash if not uniform. For fixed width vectors we calculate every lane of the scalarised pointer induction variable for a given VF, however this cannot work for scalable vectors. In this case I have added support for caching the whole vector value for each unrolled part so that we can always extract an arbitrary element. Additionally, we still continue to cache the known minimum number of lanes too in order to improve code quality by avoiding an extractelement operation. I have adapted an existing test `pointer_iv_mixed` from the file: Transforms/LoopVectorize/consecutive-ptr-uniforms.ll and added it here for scalable vectors instead: Transforms/LoopVectorize/AArch64/sve-widen-phi.ll Differential Revision: https://reviews.llvm.org/D101294
-
Peter Waller authored
The sve.convert.to.svbool lowering has the effect of widening a logical <M x i1> vector representing lanes into a physical <16 x i1> vector representing bits in a predicate register. In general, if converting to svbool, the contents of lanes in the physical register might not be known. For sve.convert.to.svbool the new lanes are specified to be zeroed, requiring 'and' instructions to mask off the new lanes. For lanes coming from a ptrue or a comparison, however, they are known to be zero. CodeGen Before: ptrue p0.s, vl16 ptrue p1.s ptrue p2.b and p0.b, p2/z, p0.b, p1.b ret After: ptrue p0.s, vl16 ret Differential Revision: https://reviews.llvm.org/D101544
-
Michał Górny authored
Add a function to read NT_PRPSINFO note from FreeBSD core dumps. This is necessary to get the process ID (NT_PRSTATUS has only thread ID). Move the lp64 check from NT_PRSTATUS parsing to the parseFreeBSDNotes() to avoid repeating it. Differential Revision: https://reviews.llvm.org/D101893
-
Michał Górny authored
The FreeBSD coredumps from i386 systems contain only FSAVE-style NT_FPREGSET. Since we do not really support reading that kind of data anymore, just use NT_X86_XSTATE to get FXSAVE-style data when available. Differential Revision: https://reviews.llvm.org/D101086
-
Stephen Tozer authored
Previous crashes caused by this patch were the result of machine subregisters being incorrectly handled in updateDbgUsersToReg; this has been fixed by using RegUnits to determine overlapping registers, instead of using the register values directly. Differential Revision: https://reviews.llvm.org/D101523 This reverts commit 7ca26c5f.
-
Neal (nealsid) authored
I don't mean to undo others' work but it looks like the hand-rolled EditLine for LLDB on Windows isn't used. It'd be easier to make changes to bring the other platforms' Editline wrapper up to date (e.g. simplifying char vs wchar_t) without modifying/testing this one too. Reviewed By: amccarth Differential Revision: https://reviews.llvm.org/D102208
-
Piotr Sobczak authored
No need to handle invariant loads when avoiding WAR conflicts, as there cannot be a vector store to the same memory location. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D101177
-
Qiu Chaofan authored
This commit brought build break in some f128 related tests. But that's not the root cause. There exists some differences between Clang and GCC's definition for 128-bit float types on PPC, so macros/functions in glibc may not work with clang -mfloat128 well. We need to handle this carefully and reland it.
-