- Dec 17, 2018
-
-
Craig Topper authored
The CC is operand 2 not operand 3. llvm-svn: 349330
-
- Dec 16, 2018
-
-
Simon Pilgrim authored
We had 3 different approaches - consistently use getTargetConstantBitsFromNode and allow undef elts. llvm-svn: 349319
-
Craig Topper authored
I'd like to try to move a lot of the flag matching out of EmitTest and push it to isel or isel preprocessing. This is a step towards that. The test-shrink-bug.ll changie is an improvement because we are no longer interfering with test shrink handling in isel. The pr34137.ll change is a regression, but the IR came from -O0 and was not reduced by InstCombine. So it contains a lot of redundancies like duplicate loads that made it combine poorly. llvm-svn: 349315
-
Sanjay Patel authored
This is part of fixing PR39859: https://bugs.llvm.org/show_bug.cgi?id=39859 We have a crippled vector ISA, so we have to invert a typical fold and create min/max here. As discussed in the bug report, we can probably do better by using saturating subtract when it's available, but we should have this improvement for the min/max patterns regardless. Alive proofs: https://rise4fun.com/Alive/zsf https://rise4fun.com/Alive/Qrl Differential Revision: https://reviews.llvm.org/D55515 llvm-svn: 349304
-
- Dec 15, 2018
-
-
Simon Pilgrim authored
In preparation for converting to funnel shifts. llvm-svn: 349286
-
Simon Pilgrim authored
Use consistent rules for when to lower to SHLD/SHRD for slow machines - fixes a weird issue where funnel shift gets expanded but then X86ISelLowering's combineOr sees the optsize and combines to SHLD/SHRD, but now with the modulo amount guard...... llvm-svn: 349285
-
Craig Topper authored
[X86] Rename hasNoSignedComparisonUses to hasNoSignFlagUses. Add the instruction that only modify the O flag to the waiver list. The only caller of this turns CMP with 0 into TEST. CMP with 0 and TEST both set OF to 0 so we should have no issues with instructions that only use OF. Though I don't think there's any reason we would read just OF after a compare with 0 anyway. So this probably isn't an observable change. llvm-svn: 349223
-
Craig Topper authored
[X86] Make hasNoCarryFlagUses/hasNoSignedComparisonUses take an SDValue that indicates which result is the flag result. NFCI hasNoCarryFlagUses hardcoded that the flag result is 1 and used that to filter which uses were of interest. hasNoSignedComparisonUses just assumes the only result is flags and checks whether any user of the node is a CopyToReg instruction. After this patch we now do a result number check in both and rely on the caller to provide the result number. This shouldn't change behavior it was just an odd difference between the two functions that I noticed. llvm-svn: 349222
-
- Dec 14, 2018
-
-
Craig Topper authored
[DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext (setcc)) already has the target desired type for the setcc Summary: If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node. To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55459 llvm-svn: 349137
-
Craig Topper authored
[X86] Demote EmitTest to a helper function of EmitCmp. Route all callers except EmitCmp through EmitCmp. This requires the two callers to manifest a 0 to make EmitCmp call EmitTest. I'm looking into changing how we combine TEST and flag setting instructions to not be part of lowering. And instead be part of DAG combine or isel. Which will mean EmitTest will probably become gutted and maybe disappear entirely. llvm-svn: 349094
-
- Dec 13, 2018
-
-
Mircea Trofin authored
Summary: Macros are expanded on a single line. In case of large expansions, with sufficiently many instructions with memory operands (and when -fdebug-info-for-profiling is requested), we may be unable to generate new base discriminator values - new values overflow (base discriminators may not be larger than 2^12). This CL warns instead of asserting in such a case. A subsequent CL will add APIs to check for overflow before creating new debug info. See https://bugs.llvm.org/show_bug.cgi?id=39890 Reviewers: davidxl, wmi, gbedwell Reviewed By: davidxl Subscribers: aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D55643 llvm-svn: 349075
-
Simon Pilgrim authored
llvm-svn: 349057
-
Simon Pilgrim authored
There's still a couple of minor SimplifyDemandedElts regressions in some of the shift amount splats that will be fixed in future patches. llvm-svn: 349052
-
Simon Pilgrim authored
llvm-svn: 349047
-
Simon Pilgrim authored
Merged the repeated code into a single if(). llvm-svn: 349040
-
Simon Pilgrim authored
We always expand to shifts anyhow - test changes are just different scheduling only. llvm-svn: 349034
-
Simon Pilgrim authored
Remove common code from custom lowering (code is still safe if somehow a zero value gets used). llvm-svn: 349028
-
Simon Pilgrim authored
Move existing rotation expansion code into TargetLowering and set it up for vectors as well. Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment. Begun removing x86 vector rotate custom lowering to use the expansion. llvm-svn: 349025
-
Craig Topper authored
llvm-svn: 349007
-
- Dec 12, 2018
-
-
Craig Topper authored
MULX has somewhat improved register allocation constraints compared to the legacy MUL instruction. Both output registers are encoded instead of fixed to EAX/EDX, but EDX is used as input. It also doesn't touch flags. Unfortunately, the encoding is longer. Prefering it whenever BMI2 is enabled is probably not optimal. Choosing it should somehow be a function of register allocation constraints like converting adds to three address. gcc and icc definitely don't pick MULX by default. Not sure what if any rules they have for using it. Differential Revision: https://reviews.llvm.org/D55565 llvm-svn: 348975
-
Craig Topper authored
I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that. I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful. Differential Revision: https://reviews.llvm.org/D55414 llvm-svn: 348959
-
Simon Pilgrim authored
This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns. It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller. A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS). I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection. Differential Revision: https://reviews.llvm.org/D55426 llvm-svn: 348953
-
Sanjay Patel authored
This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds. That allows us to combine add ops with register moves and save some instructions. This is another step towards allowing add truncation in generic DAGCombiner (see D54640). Differential Revision: https://reviews.llvm.org/D55494 llvm-svn: 348946
-
Craig Topper authored
This is similar to the combine we already have for vpmovdw+vpackuswb. llvm-svn: 348910
-
- Dec 11, 2018
-
-
Craig Topper authored
Summary: When doing X86CondBrFolding::analyzeCompare, it will meet the SUB32ri instruction as below to use the global address for its operand, %733:gr32 = SUB32ri %62:gr32(tied-def 0), @img2buf_normal, implicit-def $eflags JNE_1 %bb.41, implicit $eflags so the assertion "assert(MI.getOperand(ValueIndex).isImm() && "Expecting Imm operand")" is not correct and change the assert to if make X86CondBrFolding::analyzeCompare return false as not finding the compare for this Patch by Jianping Chen Reviewers: smaslov, LuoYuanke, liutianle, Jianping Reviewed By: Jianping Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D54250 llvm-svn: 348853
-
Sanjay Patel authored
As discussed in D55494, we want to extend this to handle 8-bit ops too, but that could be extended further to enable this on 32-bit systems too. llvm-svn: 348851
-
Sanjay Patel authored
As discussed in: D55494 ...this code has been disabled/dead for a long time (the code references Athlon and Pentium 4), and there's almost no chance that it will be used given the last decade of uarch evolution. Also, in SDAG we promote 16-bit ops to 32-bit, so there's almost no way to test this code any more. llvm-svn: 348845
-
- Dec 10, 2018
-
-
Amara Emerson authored
This patch restricts the capability of G_MERGE_VALUES, and uses the new G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places. This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32> and <2 x s64> vectors. Differential Revisions: https://reviews.llvm.org/D53629 llvm-svn: 348788
-
Sanjay Patel authored
This should really be generalized to allow increment and/or we should replace it by using ISD::matchUnaryPredicate(). See D55515 for context. llvm-svn: 348776
-
Cameron McInally authored
Should be "Sae" for "Suppress All Exceptions". NFC llvm-svn: 348763
-
Nikita Popov authored
Fixes https://bugs.llvm.org/show_bug.cgi?id=39926. The size of the first copy was computed as std::abs(std::abs(LdDisp2) - std::abs(LdDisp1)), which results in skipped bytes if the signs of LdDisp2 and LdDisp1 differ. As far as I can see, this should just be LdDisp2 - LdDisp1. The case where LdDisp1 > LdDisp2 is already handled in the code above, in which case LdDisp2 is set to LdDisp1 and this subtraction will evaluate to Size1 = 0, which is the correct value to skip an overlapping copy. Differential Revision: https://reviews.llvm.org/D55485 llvm-svn: 348750
-
Craig Topper authored
Both intrinsics do the exact same thing so we really only need one. Earlier in the 8.0 cycle we changed the signature of this intrinsic without renaming it. But it looks difficult to get the autoupgrade code to allow me to merge the intrinsics and change the signature at the same time. So I've renamed the intrinsic slightly for the new merged intrinsic. I'm skipping autoupgrading from the previous new to 8.0 signature. I've also renamed the subborrow for consistency. llvm-svn: 348737
-
- Dec 09, 2018
-
-
Craig Topper authored
[X86] If the carry input to an addcarry/subborrow intrinsic is known to be 0, emit a flag setting ADD/SUB instead of ADC/SBB. Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother. This should go a long way towards fixing PR24545. llvm-svn: 348727
-
Nico Weber authored
The dependency was added in r213995 in response to r213986 which did make X86/Utils depend on IR, but r256680 later removed that dependency again. llvm-svn: 348724
-
Sanjay Patel authored
The existing code tries to handle an undef operand while transforming an add to an LEA, but it's incomplete because we will crash on the i16 test with the debug output shown below. It's better to just give up instead. Really, GlobalIsel should have folded these before we could get into trouble. # Machine code for function add_undef_i16: NoPHIs, TracksLiveness, Legalized, RegBankSelected, Selected bb.0 (%ir-block.0): liveins: $edi %1:gr32 = COPY killed $edi %0:gr16 = COPY %1.sub_16bit:gr32 %5:gr64_nosp = IMPLICIT_DEF %5.sub_16bit:gr64_nosp = COPY %0:gr16 %6:gr64_nosp = IMPLICIT_DEF %6.sub_16bit:gr64_nosp = COPY %2:gr16 %4:gr32 = LEA64_32r killed %5:gr64_nosp, 1, killed %6:gr64_nosp, 0, $noreg %3:gr16 = COPY killed %4.sub_16bit:gr32 $ax = COPY killed %3:gr16 RET 0, implicit killed $ax # End machine code for function add_undef_i16. *** Bad machine code: Reading virtual register without a def *** - function: add_undef_i16 - basic block: %bb.0 (0x7fe6cd83d940) - instruction: %6.sub_16bit:gr64_nosp = COPY %2:gr16 - operand 1: %2:gr16 LLVM ERROR: Found 1 machine code errors. Differential Revision: https://reviews.llvm.org/D54710 llvm-svn: 348722
-
Simon Pilgrim authored
Extension to rL348617, turns out llvm-exegesis doesn't need to match the perf counter name against a scheduler model resource name - so I've added a few more counters that I could find in the libpfm4 source code (and fix a typo in the knl/knm retired_uops counter - which uses 'all' instead of 'any'). llvm-svn: 348721
-
- Dec 07, 2018
-
-
Simon Pilgrim authored
llvm-svn: 348626
-
Craig Topper authored
To make X86CondBrFoldingPass can be run with --run-pass option, this can test one wrong assertion on analyzeCompare function for SUB32ri when its operand is not imm Patch by Jianping Chen Differential Revision: https://reviews.llvm.org/D55412 llvm-svn: 348620
-
Simon Pilgrim authored
This patch attempts to improve pfm perf counter coverage for all the x86 CPUs that libpfm4 supports. Intel/AMD CPU families tend to share names for cycle/uops counters so even if they don't have a scheduler model yet they can at least use the default values (checked against the libpfm4 source code). The remaining CPUs (where their port/pipe resource counters are known) I've tried to add to the existing model mappings. These are untested but don't represent a regression to current llvm-exegesis behaviour for these CPUs. Differential Revision: https://reviews.llvm.org/D55432 llvm-svn: 348617
-
David Green authored
Adds fatal errors for any target that does not support the Tiny or Kernel codemodels by rejigging the getEffectiveCodeModel calls. Differential Revision: https://reviews.llvm.org/D50141 llvm-svn: 348585
-