- Jun 03, 2019
-
-
Simon Pilgrim authored
Pre-commit for D62807 - which adds DAG [us]itofp(undef) --> 0 constant fold llvm-svn: 362396
-
Diogo N. Sampaio authored
Summary: - pr42062 When compiling for MinSize, ARMTargetLowering::LowerCall decides to indirect multiple calls to a same function. However, it disconsiders the limitation that thumb1 indirect calls require the callee to be in a register from r0 to r3 (llvm limiation). If all those registers are used by arguments, the compiler dies with "error: run out of registers during register allocation". This patch tells the function IsEligibleForTailCallOptimization if we intend to perform indirect calls, as to avoid tail call optimization. Reviewers: dmgreen, efriedma Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62683 llvm-svn: 362366
-
Sam Parker authored
DAGCombiner was hitting a SimpleType assertion when trying to combine a v3f32 before type legalization. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41916 Differential Revision: https://reviews.llvm.org/D62734 llvm-svn: 362365
-
Roman Lebedev authored
llvm-svn: 362364
-
Jim Lin authored
Summary: LDWRdPtr would be expanded to ld+ldd. ldd only accepts the pointer register is Y or Z. So the register class of pointer of LDWRdPtr should be PTRDISPREGS instead of PTRREGS. Reviewers: dylanmckay Reviewed By: dylanmckay Subscribers: dylanmckay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62300 llvm-svn: 362351
-
Florian Hahn authored
If we hit the limit, we do expand the outstanding tokenfactors. Otherwise, we might drop nodes with users in the unexpanded tokenfactors. This fixes the crashes reported by Jordan Rupprecht. Reviewers: niravd, spatel, craig.topper, rupprecht Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D62633 llvm-svn: 362350
-
Craig Topper authored
Similar to what was done for masked load and gather. llvm-svn: 362342
-
Craig Topper authored
[X86] Add test cases for masked store and masked scatter with an all zeroes mask. Fix bug in ScalarizeMaskedMemIntrin Need to cast only to Constant instead of ConstantVector to allow ConstantAggregateZero. llvm-svn: 362341
-
- Jun 02, 2019
-
-
Craig Topper authored
Similar to what was recently done for gathers in r362015. llvm-svn: 362337
-
Roman Lebedev authored
llvm-svn: 362330
-
Simon Pilgrim authored
Let's us match horizontal op patterns on fast-variable-shuffle targets (Haswell etc.) llvm-svn: 362327
-
Simon Pilgrim authored
Haswell etc. will combine shuffles to a extract_subvector(permd(x)) before isHorizontalBinOp can match it. llvm-svn: 362326
-
Roman Lebedev authored
We are also free to interpret this as 'BZHI'/'BEXTR'. https://rise4fun.com/Alive/dD6 llvm-svn: 362325
-
Simon Pilgrim authored
[DAG] isBitwiseNot / isConstOrConstSplat - add support for build vector undefs + truncation (PR41020) Add (opt-in) support for implicit truncation to isConstOrConstSplat, which allows us to match truncated 'all ones' cases in isBitwiseNot. PR41020 compares against using ISD::isBuildVectorAllOnes() instead, but that predicate silently accepts any UNDEF elements in the build vector which might not be what we want in isBitwiseNot - so I've added an opt-in 'AllowUndefs' flag that is set to false by default but will allow us to enable it on individual cases where its safe. Differential Revision: https://reviews.llvm.org/D62783 llvm-svn: 362323
-
Roman Lebedev authored
If we look past truncations of X too eagerly (D62786), we may end up with 64-bit 'BEXTR', even though 32-bit-one would suffice. llvm-svn: 362319
-
Craig Topper authored
Forgot to do the widen forms when I was doing the others. llvm-svn: 362310
-
Craig Topper authored
llvm-svn: 362309
-
Craig Topper authored
The AVX512BW and AVX512VL checks were never used. And AVX512 is the same as AVX on all tests that weren't already split for AVX1 and AVX2. llvm-svn: 362308
-
Craig Topper authored
llvm-svn: 362307
-
- Jun 01, 2019
-
-
Simon Pilgrim authored
llvm-svn: 362303
-
Simon Pilgrim authored
llvm-svn: 362300
-
Simon Atanasyan authored
The `cfcmsa` and `ctcmsa` instructions accept index of MSA control register. The MIPS64 SIMD Architecture define eight MSA control registers. But register index for `cfcmsa` and `ctcmsa` instructions might be any number in 0..31 range. If the index is greater then 7, `cfcmsa` writes zero to the destination registers and `ctcmsa` does nothing [1]. [1] MIPS Architecture for Programmers Volume IV-j: The MIPS64 SIMD Architecture Module https://www.mips.com/?do-download=the-mips64-simd-architecture-module Differential Revision: https://reviews.llvm.org/D62597 llvm-svn: 362299
-
Dylan McKay authored
If we would allow register coalescing on PTRDISPREGS class then register allocator can lock Z register to some virtual register. Larger instructions requiring a memory acces then fail during the register allocation phase since there is no available register to hold a pointer if Y register was already taken for a stack frame. This patch prevents it by keeping Z register spillable. It does it by not allowing coalescer to lock it. Original discussion on https://github.com/avr-rust/rust/issues/128. llvm-svn: 362298
-
Roman Lebedev authored
I have initially added it in for test to display both whether the binop w/ constant is sinked or hoisted. But as it can be seen from the 'sub (sub C, %x), %y' test, that actually conceals the issues it is supposed to test. At least two more patterns are unhandled: * 'add (sub C, %x), %y' - D62266 * 'sub (sub C, %x), %y' llvm-svn: 362295
-
Craig Topper authored
llvm-svn: 362288
-
Matt Arsenault authored
Fixes missing test from r293000. llvm-svn: 362275
-
- May 31, 2019
-
-
Puyan Lotfi authored
We don't want to create vregs if there is nothing to use them for. That causes verifier errors. Differential Revision: https://reviews.llvm.org/D62740 llvm-svn: 362247
-
Kevin P. Neal authored
[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully. Submitted by: Drew Wock <drew.wock@sas.com> Reviewed by: Cameron McInally, Kevin P. Neal Approved by: Cameron McInally Differential Revision: https://reviews.llvm.org/D62546 llvm-svn: 362241
-
Guozhi Wei authored
In PPCReduceCRLogicals after splitting the original MBB into 2, the 2 impacted branches still use original branch probability. This is unreasonable. Suppose we have following code, and the probability of each successor is 50%. condc = conda || condb br condc, label %target, label %fallthrough It can be transformed to following, br conda, label %target, label %newbb newbb: br condb, label %target, label %fallthrough Since each branch has a probability of 50% to each successor, the total probability to %fallthrough is 25% now, and the total probability to %target is 75%. This actually changed the original profiling data. A more reasonable probability can be set to 70% to the false side for each branch instruction, so the total probability to %fallthrough is close to 50%. This patch assumes the branch target with two incoming edges have same edge frequency and computes new probability fore each target, and keep the total probability to original targets unchanged. Differential Revision: https://reviews.llvm.org/D62430 llvm-svn: 362237
-
Simon Pilgrim authored
llvm-svn: 362230
-
Simon Pilgrim authored
llvm-svn: 362229
-
Petar Avramovic authored
Test different operand types of callee and their behavior whether relocation model is pic or not. Possible operand types are: Register (function pointer), External symbol (used for libcalls e.g. __udivdi3 or memcpy), Global address. Global address has different handling depending on relocation model and linkage type. Register and external symbol do not. Differential Revision: https://reviews.llvm.org/D62590 llvm-svn: 362212
-
Petar Avramovic authored
Handle position independent code for MIPS32. When callee is global address, lower call will emit callee as G_GLOBAL_VALUE and add target flag if needed. Support $gp in getRegBankFromRegClass(). Select G_GLOBAL_VALUE, specially handle case when there are target flags attached by lowerCall. Differential Revision: https://reviews.llvm.org/D62589 llvm-svn: 362210
-
Roman Lebedev authored
Just for completeness. llvm-svn: 362208
-
Petar Avramovic authored
Lower call for callee that is register for MIPS32. Register should contain callee function address. Differential Revision: https://reviews.llvm.org/D62585 llvm-svn: 362204
-
Craig Topper authored
These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits. Similar to PR42079. Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the load pattern used in the instructions. This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe we should use VZMOVL for widened loads even when we don't need the upper bits as zeroes? llvm-svn: 362203
-
Craig Topper authored
llvm-svn: 362202
-
Craig Topper authored
Similar to PR42079 llvm-svn: 362201
-
Craig Topper authored
llvm-svn: 362200
-
Craig Topper authored
DAG combine will usually fold fpextend+load to an fp extload anyway. So the 256 and 512 patterns were probably unnecessary. The 128 bit pattern was special in that it looked for a v4f32 load, but then used it in an instruction that only loads 64-bits. This is bad if the load happens to be volatile. We could probably make the patterns volatile aware, but that's more work for something that's probably rare. The peephole pass might kick in and save us anyway. We might also be able to fix this with some additional DAG combines. This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be used. Previously we looked for the unlikely vselect+fpextend+load. llvm-svn: 362199
-