- May 26, 2019
-
-
Shawn Landden authored
Rather than gating on "isSwitchDense" (resulting in necessesarily sparse lookup tables even when they were generated), always run this quite cheap transform. This transform is useful not just for generating tables. LowerSwitch also wants this: read LowerSwitch.cpp:257. Be careful to not generate worse code, by introducing a SubThreshold heuristic. Instead of just sorting by signed, generalize the finding of the best base. And now that it is run unconditionally, do not replicate its functionality in SwitchToLookupTable (which could use a Sub when having a hole is smaller, hence the SubThreshold heuristic located in a single place). This simplifies SwitchToLookupTable, and fixes some ugly corner cases due to the use of signed numbers, such as a table containing i16 32768 and 32769, of which 32769 would be interpreted as -32768, and now the code thinks the table is size 65536. (We still use unconditional subtraction when building a single-register mask, but I think this whole block should go when the more general sparse map is added, which doesn't leave empty holes in the table.) And the reason test4 and test5 did not trigger was documented wrong: it was because they were not considered sufficiently "dense". Also, fix generation of invalid LLVM-IR: shl by bit-width. llvm-svn: 361727
-
Shawn Landden authored
and replace with an equilivent countTrailingZeros. GCD is much more expensive than this, with repeated division. This depends on D60823 llvm-svn: 361726
-
Shawn Landden authored
Also add baseline tests to show effect of later patches. llvm-svn: 361725
-
Shawn Landden authored
This matches countLeadingOnes() and countTrailingOnes(), and APInt's countLeadingZeros() and countTrailingZeros(). (as well as __builtin_clzll()) llvm-svn: 361724
-
Nikita Popov authored
The implementation in ValueTracking and ConstantRange are equally powerful, reuse the one in ConstantRange, which will make this easier to extend. llvm-svn: 361723
-
Nico Weber authored
llvm-svn: 361722
-
Nikita Popov authored
Extract method to compute overflow based on binop and signedness, and then make the result handling code generic. This extends the always-overflow handling to signed muls, but has currently no effect, as we don't compute always overflow for them (thus NFC). llvm-svn: 361721
-
Nikita Popov authored
Instead pass binary op and signedness. The extra enum only makes things more complicated in this case. llvm-svn: 361720
-
David Green authored
This adds a pattern for fma, similar to the float and double patterns. Differential Revision: https://reviews.llvm.org/D62330 llvm-svn: 361719
-
David Green authored
This add patterns for fp16 round and ceil etc. Same as the float and double patterns. Differential Revision: https://reviews.llvm.org/D62326 llvm-svn: 361718
-
David Green authored
Promote a number of fp16 math intrinsics to float, so that the relevant float math routines can be used. Copysign is expanded so as to be handled in-place. Differential Revision: https://reviews.llvm.org/D62325 llvm-svn: 361717
-
Simon Pilgrim authored
We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well. There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well. llvm-svn: 361716
-
David Green authored
This adds a pattern for the fabs intrinsic, the same as float and double. Differential Revision: https://reviews.llvm.org/D62324 llvm-svn: 361715
-
David Green authored
This adds a pattern for the sqrt intrinsic, the same as float and double. Differential Revision: https://reviews.llvm.org/D62322 llvm-svn: 361714
-
David Green authored
Promote fp16 frem operations on ARM to floats so they call fmodf. Differential Revision: https://reviews.llvm.org/D62321 llvm-svn: 361713
-
David Green authored
llvm-svn: 361712
-
Fangrui Song authored
While people mostly care about 64-bit, some systems need basic lib32 support. The plan is to make lld (see PR40888) capable of linking some applications (PR40888). llvm-svn: 361711
-
Fangrui Song authored
llvm-svn: 361710
-
Petr Hosek authored
This is a follow up to r361432 and r361504 which addresses issues introduced by those changes. Specifically, it avoids duplicating file and runtime paths in case when the effective triple is the same as the cannonical one. Furthermore, it fixes the broken multilib setup in the Fuchsia driver and deduplicates some of the code. Differential Revision: https://reviews.llvm.org/D62442 llvm-svn: 361709
-
Duncan P. N. Exon Smith authored
llvm-svn: 361708
-
David Bolvansky authored
Summary: PR41688 Reviewers: spatel, efriedma, craig.topper, hfinkel, reames Reviewed By: hfinkel Subscribers: javed.absar, dmgreen, fhahn, hfinkel, reames, nikic, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61409 llvm-svn: 361707
-
- May 25, 2019
-
-
Simon Pilgrim authored
Commonly occurs in sign-extension cases llvm-svn: 361706
-
Robert Widmann authored
Summary: Allow for retrieving an object file corresponding to an architecture-specific slice in a Mach-O universal binary file. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60378 llvm-svn: 361705
-
Nikita Popov authored
If we have a known non-nan operand, place it in the second operand of fmin/fmax that is returned if either operand is nan. Differential Revision: https://reviews.llvm.org/D62448 llvm-svn: 361704
-
Nikita Popov authored
Adds support for the uadd.sat family of intrinsics in LVI, based on ConstantRange methods from D60946. Differential Revision: https://reviews.llvm.org/D62447 llvm-svn: 361703
-
Simon Pilgrim authored
Add X32-SSE common prefix to merge some checks llvm-svn: 361702
-
Sanjay Patel authored
The test diffs show improved vector narrowing for integer min/max opcodes because those were all absent from the list. I'm not sure if we can expose functional diffs for all of the moved/added opcodes though. It seems like we are missing an AVX512 opportunity to use 256-bit ops in place of 512-bit ops on some tests/targets, but I think that can be a follow-up. Preliminary steps to make sure the callers are not misusing these queries: rL361268 rL361547 Differential Revision: https://reviews.llvm.org/D62191 llvm-svn: 361701
-
Nikita Popov authored
llvm-svn: 361700
-
Nikita Popov authored
llvm-svn: 361699
-
Nikita Popov authored
The guaranteed no-wrap region is never empty, it always contains at least zero, so these optimizations don't ever apply. To make this more obviously true, replace the conversative return in makeGNWR with an assertion. llvm-svn: 361698
-
David Bolvansky authored
llvm-svn: 361697
-
Sanjay Patel authored
The test based on PR42010: https://bugs.llvm.org/show_bug.cgi?id=42010 ...may show an inaccuracy for PPC's target defs, but we should not be so aggressive with an assert here. There's no telling what out-of-tree targets look like. llvm-svn: 361696
-
David Bolvansky authored
llvm-svn: 361695
-
Nikita Popov authored
llvm-svn: 361694
-
Nikita Popov authored
In LVI, calculate the range of extractvalue(op.with.overflow(%x, %y), 0) as the range of op(%x, %y). This is mainly useful in conjunction with D60650: If the result of the operation is extracted in a branch guarded against overflow, then the value of %x will be appropriately constrained and the result range of the operation will be calculated taking that into account. Differential Revision: https://reviews.llvm.org/D60656 llvm-svn: 361693
-
Nikita Popov authored
llvm-svn: 361692
-
Craig Topper authored
INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags. This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg. One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input. Differential Revision: https://reviews.llvm.org/D61472 llvm-svn: 361691
-
Craig Topper authored
[X86] Add zero idioms to the haswell, broadwell, and skylake schedule models. Add 256-bit fp xor to sandybridge zero idioms This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate. Differential Revision: https://reviews.llvm.org/D62360 llvm-svn: 361690
-
Craig Topper authored
This pre-commits tests for D62360 llvm-svn: 361689
-
Peter Collingbourne authored
Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence." Broke sanitizer bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio llvm-svn: 361688
-