- Feb 11, 2019
-
-
Sam Parker authored
Remove unnecessary offset checks, CHECK-BASE checks and add some extra -NOT checks and TODO comments. llvm-svn: 353689
-
Carlos Alberto Enciso authored
Check that when SimplifyCFG is flattening a 'br', all their debug intrinsic instructions are removed, including any dbg.label referencing a label associated with the basic blocks being removed. As the test case involves a CFG transformation, move it to the correct location. Differential Revision: https://reviews.llvm.org/D57444 llvm-svn: 353682
-
Sjoerd Meijer authored
The whole design of generating LDMs/STMs is fragile and unreliable: it depends on rescheduling here in the LoadStoreOptimizer that isn't register pressure aware and regalloc that isn't aware of generating LDMs/STMs. This patch adds a (hidden) option to control the total number of instructions that can be re-ordered. I appreciate this looks only a tiny bit better than a hard-coded constant, but at least it allows more easy experimentation with different values for now. Ideally we calculate this reorder limit based on some heuristics, and take register pressure into account. I might be looking into that next. Differential Revision: https://reviews.llvm.org/D57954 llvm-svn: 353678
-
- Feb 10, 2019
-
-
Mandeep Singh Grang authored
Differential Revision: https://reviews.llvm.org/D57988 llvm-svn: 353652
-
Nikita Popov authored
Now that we have vector support for [US](ADD|SUB)O we no longer need to scalarize when expanding [US](ADD|SUB)SAT. This matches what the cost model already does. Differential Revision: https://reviews.llvm.org/D57348 llvm-svn: 353651
-
Simon Pilgrim authored
llvm-svn: 353648
-
Simon Pilgrim authored
Shows missing SimplifyDemandedBits support llvm-svn: 353647
-
Simon Pilgrim authored
Now that we have SimplifyDemandedBits support for funnel shifts (rL353539), we need to simplify funnel shifts back to bitshifts in cases where either argument has been folded to undef/zero. Differential Revision: https://reviews.llvm.org/D58009 llvm-svn: 353645
-
Simon Pilgrim authored
I've avoided 'modulo' masks as we'll SimplifyDemandedBits those in the future, and we just need to check that the shift variable is 'in range' llvm-svn: 353644
-
Sanjay Patel authored
256-bit horizontal math ops are an x86 monstrosity (and thankfully have not been extended to 512-bit AFAIK). The two 128-bit halves operate on separate halves of the inputs. So if we don't demand anything in the upper half of the result, we can extract the low halves of the inputs, do the math, and then insert that result into a 256-bit output. All of the extract/insert is free (ymm<-->xmm), so we're left with a narrower (cheaper) version of the original op. In the affected tests based on: https://bugs.llvm.org/show_bug.cgi?id=33758 https://bugs.llvm.org/show_bug.cgi?id=38971 ...we see that the h-op narrowing can result in further narrowing of other math via existing generic transforms. I originally drafted this patch as an exact pattern match starting from extract_vector_elt, but I thought we might see diffs starting from extract_subvector too, so I changed it to a more general demanded elements solution. There are no extra existing regression test improvements from that switch though, so we could go back. Differential Revision: https://reviews.llvm.org/D57841 llvm-svn: 353641
-
Simon Pilgrim authored
As suggested on D58009 llvm-svn: 353640
-
Sanjay Patel authored
SimplifySetCC still has much room for improvement, but this should fix the remaining problem examples from: https://bugs.llvm.org/show_bug.cgi?id=40657 The initial fix for this problem was rL353615. llvm-svn: 353639
-
Simon Pilgrim authored
llvm-svn: 353638
-
- Feb 09, 2019
-
-
Simon Pilgrim authored
If one of the shifted arguments is undef we should be folding to a regular shift. llvm-svn: 353628
-
Simon Pilgrim authored
As discussed on D57389, this is a first step towards moving the SHLD/SHRD matching code to DAGCombiner using FSHL/FSHR instead. There's a bit of work to do before I can do that, so this just folds to FSHL/FSHR in the existing code (handling the different SHRD/FSHR argument ordering), which fixes the issue we had with i16 shift amounts not being correctly masked. llvm-svn: 353626
-
Sanjay Patel authored
llvm-svn: 353625
-
Sanjay Patel authored
There's effectively no difference for the cases with variables. We just trade a sub for an add on those. But the case with a subtract from constant would require an extra move instruction on x86, so this looks like a reasonable generic combine. llvm-svn: 353619
-
Sanjay Patel authored
llvm-svn: 353618
-
Simon Pilgrim authored
llvm-svn: 353616
-
Sanjay Patel authored
llvm-svn: 353615
-
Simon Pilgrim authored
D42042 introduced the ability for the ExecutionDomainFixPass to more easily change between BLENDPD/BLENDPS/PBLENDW as the domains required. With this ability, we can avoid most bitcasts/scaling in the DAG that was occurring with X86ISD::BLENDI lowering/combining, blend with the vXi32/vXi64 vectors directly and use isel patterns to lower to the float vector equivalent vectors. This helps the shuffle combining and SimplifyDemandedVectorElts be more aggressive as we lose track of fewer UNDEF elements than when we go up/down through bitcasts. I've introduced a basic blend(bitcast(x),bitcast(y)) -> bitcast(blend(x,y)) fold, there are more generalizations I can do there (e.g. widening/scaling and handling the tricky v16i16 repeated mask case). The vector-reduce-smin/smax regressions will be fixed in a future improvement to SimplifyDemandedBits to peek through bitcasts and support X86ISD::BLENDV. Differential Revision: https://reviews.llvm.org/D57888 llvm-svn: 353610
-
Stanislav Mekhanoshin authored
llvm-svn: 353593
-
Jessica Paquette authored
After r353586, we won't fail on the AMDGPU floor pattern that was killing the importer before. llvm-svn: 353589
-
Sanjay Patel authored
llvm-svn: 353580
-
Francis Visoiu Mistrih authored
With a fix after r353563 that adds some more opcodes. llvm-svn: 353579
-
- Feb 08, 2019
-
-
Francis Visoiu Mistrih authored
This reverts commit r353553. This breaks CodeGen/AArch64/GlobalISel/legalize-ext-csedebug-output.mir: http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/57963/console llvm-svn: 353575
-
Craig Topper authored
These instructions can generate a stack overflow exception so technically they read the stack overflow exception mask bit. llvm-svn: 353564
-
Craig Topper authored
This patch accompanies the RFC posted here: http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html This patch adds a new CallBr IR instruction to support asm-goto inline assembly like gcc as used by the linux kernel. This instruction is both a call instruction and a terminator instruction with multiple successors. Only inline assembly usage is supported today. This also adds a new INLINEASM_BR opcode to SelectionDAG and MachineIR to represent an INLINEASM block that is also considered a terminator instruction. There will likely be more bug fixes and optimizations to follow this, but we felt it had reached a point where we would like to switch to an incremental development model. Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii Differential Revision: https://reviews.llvm.org/D53765 llvm-svn: 353563
-
Matt Arsenault authored
llvm-svn: 353559
-
Nemanja Ivanovic authored
The sqrt case is faster and we already do this for the case where the exponent is 0.25. This adds the 0.75 case which is also not sensitive to signed zeros. Patch by Whitney Tsang (Whitney) Differential revision: https://reviews.llvm.org/D57434 llvm-svn: 353557
-
Aditya Nandakumar authored
https://reviews.llvm.org/D57932 Add some logging + tests to make sure CSEInfo prints debug output. reviewed by: arsenm llvm-svn: 353553
-
Matt Arsenault authored
These are no longer necessary since the R600 tablegen files are split out now. llvm-svn: 353548
-
Simon Pilgrim authored
Replace OR(SHL,SRL) pattern with ISD::FSHR (legalization expands this later if necessary) - this helps with the scale == 0 'undefined' drop-through case that was discussed on D55720. llvm-svn: 353546
-
Simon Pilgrim authored
llvm-svn: 353539
-
Simon Pilgrim authored
llvm-svn: 353534
-
Carl Ritson authored
Summary: Prior to GCN3 s_load_dword offsets are in dwords rather than bytes. Thus the scratch buffer descriptor offset must be adjusted for pre-GCN3 ASICs. Reviewers: nhaehnle, tpr Reviewed By: nhaehnle Subscribers: sheredom, arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D56496 llvm-svn: 353530
-
Matt Arsenault authored
clampScalar doesn't do anything for non-power-of-2 in range. There should probably be a combination rule to reduce the number of matching rules. llvm-svn: 353526
-
Matt Arsenault authored
llvm-svn: 353522
-
Petar Avramovic authored
Make behavior of G_LOAD in widenScalar same as for G_ZEXTLOAD and G_SEXTLOAD. That is perform widenScalarDst to size given by the target and avoid additional checks in common code. Targets can reorder or add additional rules in LegalizeRuleSet for the opcode to achieve desired behavior. Select extending load that does not have specified type of extension into zero extending load. Select truncating store that stores number of bytes indicated by size in MachineMemoperand. Differential Revision: https://reviews.llvm.org/D57454 llvm-svn: 353520
-
Matt Arsenault authored
llvm-svn: 353516
-