- Oct 07, 2020
-
-
Simon Pilgrim authored
Prep work before some cleanup in narrowMaskedBinOp
-
Amara Emerson authored
This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787
-
Roman Lebedev authored
-
Philip Reames authored
-
Roman Lebedev authored
In some cases, we can negate instruction if only one of it's operands negates. Previously, we assumed that constants would have been canonicalized to RHS already, but that isn't guaranteed to happen, because of InstCombine worklist visitation order, as the added test (previously-hanging) shows. So if we only need to negate a single operand, we should ensure ourselves that we try constant operand first. Do that by re-doing the complexity sorting ourselves, when we actually care about it. Fixes https://bugs.llvm.org/show_bug.cgi?id=47752
-
Simon Pilgrim authored
-
- Oct 06, 2020
-
-
Dávid Bolvanský authored
-
Arthur Eubanks authored
Some of these depended on analyses being present that aren't provided automatically in NPM. early_dce_clobbers_callgraph.ll was previously inlining a noinline function? cast-call-combine.ll relied on the legacy always-inline pass being a CGSCC pass and getting rerun. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D88187
-
- Oct 05, 2020
-
-
Roman Lebedev authored
[InstCombine] Revert rL226781 "Teach InstCombine to canonicalize loads which are only ever stored to always use a legal integer type if one is available." (PR47592) (it was introduced in https://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html) This canonicalization seems dubious. Most importantly, while it does not create `inttoptr` casts by itself, it may cause them to appear later, see e.g. D88788. I think it's pretty obvious that it is an undesirable outcome, by now we've established that seemingly no-op `inttoptr`/`ptrtoint` casts are not no-op, and are no longer eager to look past them. Which e.g. means that given ``` %a = load i32 %b = inttoptr %a %c = inttoptr %a ``` we likely won't be able to tell that `%b` and `%c` is the same thing. As we can see in D88789 / D88788 / D88806 / D75505, we can't really teach SCEV about this (not without the https://bugs.llvm.org/show_bug.cgi?id=47592 at least) And we can't recover the situation post-inlining in instcombine. So it really does look like this fold is actively breaking otherwise-good IR, in a way that is not recoverable. And that means, this fold isn't helpful in exposing the passes that are otherwise unaware of these patterns it produces. Thusly, i propose to simply not perform such a canonicalization. The original motivational RFC does not state what larger problem that canonicalization was trying to solve, so i'm not sure how this plays out in the larger picture. On vanilla llvm test-suite + RawSpeed, this results in increase of asm instructions and final object size by ~+0.05% decreases final count of bitcasts by -4.79% (-28990), ptrtoint casts by -15.41% (-3423), and of inttoptr casts by -25.59% (-6919, *sic*). Overall, there's -0.04% less IR blocks, -0.39% instructions. See https://bugs.llvm.org/show_bug.cgi?id=47592 Differential Revision: https://reviews.llvm.org/D88789
-
Dávid Bolvanský authored
This reverts commit 3f1fd59d.
-
Dávid Bolvanský authored
As reported in PR46735: void* f(void *d, const void *s, size_t l) { return __builtin___mempcpy_chk(d, s, l, __builtin_object_size(d, 0)); } This can be optimized to `return mempcpy(d, s, l);`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86019
-
Nikita Popov authored
When retrying the "simplify with operand replaced" select optimization without poison flags, also handle inbounds on GEPs. Of course, this particular example would also be safe to transform while keeping inbounds, but the underlying machinery does not know this (yet).
-
Nikita Popov authored
-
Simon Pilgrim authored
Added missing test coverage for shl(add(and(lshr(x,c1),c2),y),c1) -> add(and(x,c2<<c1),shl(y,c1)) combine Rename tests as 'foo' and 'bar' isn't very extensible Added vector tests with undefs and nonuniform constants
-
Simon Pilgrim authored
If we know the shift amount is less than the bitwidth we should be able to convert this to a funnel shift
-
Simon Pilgrim authored
Fixes OSS Fuzz #26135
-
- Oct 03, 2020
-
-
Roman Lebedev authored
-
Simon Pilgrim authored
Some initial test coverage toward fixing PR46896 - these are just copied from rotate.ll
-
Simon Pilgrim authored
If we know the shift amount is less than the bitwidth we should be able to convert this to a rotate/funnel shift
-
Simon Pilgrim authored
Add basic vector handling to recognizeBSwapOrBitReverseIdiom/collectBitParts - this works at the element level, all vector element operations must match (splat constants etc.) and there is no cross-element support (insert/extract/shuffle etc.).
-
Simon Pilgrim authored
[InstCombine] recognizeBSwapOrBitReverseIdiom - support for 'partial' bswap patterns (PR47191) (Reapplied) If we're bswap'ing some bytes and zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern. Reapplied with early-out if recognizeBSwapOrBitReverseIdiom collects a source wider than the result type. Differential Revision: https://reviews.llvm.org/D88578
-
- Oct 02, 2020
-
-
Simon Pilgrim authored
Revert rG3d14a1e982ad27 - "[InstCombine] recognizeBSwapOrBitReverseIdiom - support for 'partial' bswap patterns (PR47191)" This reverts commit 3d14a1e9. This is breaking on some 2stage clang buildbots
-
Simon Pilgrim authored
-
Simon Pilgrim authored
If we're bswap'ing some bytes and zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern. Differential Revision: https://reviews.llvm.org/D88578
-
Simon Pilgrim authored
-
Simon Pilgrim authored
We get the vNi16 cases already via matching as a rotate followed by the fshl -> bswap combines
-
Simon Pilgrim authored
-
- Oct 01, 2020
-
-
Nikita Popov authored
When replacing X == Y ? f(X) : Z with X == Y ? f(Y) : Z, make sure that Y cannot be undef. If it may be undef, we might end up picking a different value for undef in the comparison and the select operand.
-
Sanjay Patel authored
-
- Sep 30, 2020
-
-
Simon Pilgrim authored
As mentioned on PR47191, if we're bswap'ing some bytes and the zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern.
-
Simon Pilgrim authored
Use getScalarSizeInBits not getPrimitiveSizeInBits to determine the shift value at the element level.
-
Simon Pilgrim authored
Add tests showing failure to correctly fold vector bswap(trunc(bswap(x))) intrinsic patterns
-
Simon Pilgrim authored
Appease update_test_checks script that was complaining about potential %TMP clashes
-
Simon Pilgrim authored
Appease update_test_checks script that was complaining about potential %TMP clashes
-
Simon Pilgrim authored
-
Simon Pilgrim authored
PR39793 demonstrated an issue where we fail to recognize 'partial' bswap patterns of the lower bytes of an integer source. In fact, most of this is already in place collectBitParts suitably tags zero bits, so we just need to correctly handle this case by finding the zero'd upper bits and reducing the bswap pattern just to the active demanded bits. Differential Revision: https://reviews.llvm.org/D88316
-
- Sep 29, 2020
-
-
Sanjay Patel authored
I think we initially made this fold conservative to be safer, but we do not need the alignment attribute/metadata limitation because the masked load intrinsic itself specifies the alignment. A normal vector load is better for IR transforms and should be no worse in codegen than the masked alternative. If it is worse for some target, the backend can reverse this transform. Differential Revision: https://reviews.llvm.org/D88505
-
Sanjay Patel authored
The test after the changed test was checking exactly the same dereferenceable bytes.
-
Simon Pilgrim authored
Attempt to fold trunc (*shr (trunc A), C) --> trunc(*shr A, C) iff the shift amount if small enough that all zero/sign bits created by the shift are removed by the last trunc. Helps fix the regressions encountered in D88316. I've tweaked a couple of shift values as suggested by @lebedev.ri to ensure we have coverage of shift values close (above/below) to the max limit. Differential Revision: https://reviews.llvm.org/D88429
-
Sanjay Patel authored
It apparently didn't cause trouble for the parser or FileCheck, but it was confusing to see a function def split by asserts.
-