Skip to content
  1. Dec 17, 2018
  2. Dec 16, 2018
  3. Dec 15, 2018
  4. Dec 14, 2018
    • Craig Topper's avatar
      [DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext... · 257ce387
      Craig Topper authored
      [DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext (setcc)) already has the target desired type for the setcc
      
      Summary:
      If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node.
      
      To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term.
      
      Reviewers: RKSimon, spatel
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D55459
      
      llvm-svn: 349137
      257ce387
    • Craig Topper's avatar
      [X86] Demote EmitTest to a helper function of EmitCmp. Route all callers... · 178abc59
      Craig Topper authored
      [X86] Demote EmitTest to a helper function of EmitCmp. Route all callers except EmitCmp through EmitCmp.
      
      This requires the two callers to manifest a 0 to make EmitCmp call EmitTest.
      
      I'm looking into changing how we combine TEST and flag setting instructions to not be part of lowering. And instead be part of DAG combine or isel. Which will mean EmitTest will probably become gutted and maybe disappear entirely.
      
      llvm-svn: 349094
      178abc59
  5. Dec 13, 2018
  6. Dec 12, 2018
    • Craig Topper's avatar
      [X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input. · 4937adf7
      Craig Topper authored
      I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that.
      
      I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful.
      
      Differential Revision: https://reviews.llvm.org/D55414
      
      llvm-svn: 348959
      4937adf7
    • Simon Pilgrim's avatar
      [SelectionDAG] Add a generic isSplatValue function · eb508f8c
      Simon Pilgrim authored
      This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns.
      
      It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller.
      
      A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS).
      
      I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection.
      
      Differential Revision: https://reviews.llvm.org/D55426
      
      llvm-svn: 348953
      eb508f8c
    • Craig Topper's avatar
      [X86] Combine vpmovdw+vpacksswb into vpmovdb. · 1fe46668
      Craig Topper authored
      This is similar to the combine we already have for vpmovdw+vpackuswb.
      
      llvm-svn: 348910
      1fe46668
  7. Dec 10, 2018
    • Sanjay Patel's avatar
      [x86] fix formatting; NFC · 134f56e7
      Sanjay Patel authored
      This should really be generalized to allow increment and/or
      we should replace it by using ISD::matchUnaryPredicate().
      See D55515 for context.
      
      llvm-svn: 348776
      134f56e7
  8. Dec 09, 2018
    • Craig Topper's avatar
      [X86] If the carry input to an addcarry/subborrow intrinsic is known to be 0,... · 2b09d17d
      Craig Topper authored
      [X86] If the carry input to an addcarry/subborrow intrinsic is known to be 0, emit a flag setting ADD/SUB instead of ADC/SBB.
      
      Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother.
      
      This should go a long way towards fixing PR24545.
      
      llvm-svn: 348727
      2b09d17d
  9. Dec 06, 2018
  10. Dec 05, 2018
    • Simon Pilgrim's avatar
      [X86][SSE] Begun adding modulo rotate support to LowerRotate · 32483668
      Simon Pilgrim authored
      Prep work for PR38243 - mainly adding comments on where we need to add modulo support (doing so at the moment causes massive codegen regressions).
      
      I've also consistently added support for modulo folding for uniform constants (although at the moment we have no way to trigger this) and removed the old assertions.
      
      llvm-svn: 348366
      32483668
    • Simon Pilgrim's avatar
      [SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467) · 180639af
      Simon Pilgrim authored
      This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions.
      
      Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code.
      
      Differential Revision: https://reviews.llvm.org/D54698
      
      llvm-svn: 348353
      180639af
  11. Dec 04, 2018
  12. Dec 03, 2018
    • Sanjay Patel's avatar
      [DAGCombiner] narrow truncated vector binops when legal · d24f6347
      Sanjay Patel authored
        
      This is the smallest vector enhancement I could find to D54640.
      Here, we're allowing narrowing to only legal vector ops because we'll see
      regressions without that. All of the test diffs are wins from what I can tell.
      With AVX/AVX512, we can shrink ymm/zmm ops to xmm.
      
      x86 vector multiplies are the problem case that we're avoiding due to the
      patchwork ISA, and it's not clear to me if we can dance around those
      regressions using TLI hooks or if we need preliminary patches to plug those
      holes.
      
      Differential Revision: https://reviews.llvm.org/D55126
      
      llvm-svn: 348195
      d24f6347
    • Craig Topper's avatar
      [X86] Teach LowerMUL/LowerMULH for vXi8 to unpack constant RHS. · 5440b63f
      Craig Topper authored
      Summary:
      We need to unpackl and unpackh the operands to use two vXi16 multiplies. Previously it looks like the low unpack would get constant folded at least in the 128-bit case after shuffle lowering turned the unpackl into ZERO_EXTEND_VECTOR_INREG and X86 custom DAG combined it. The same doesn't happen for the high half. So we'd load a constant and then shuffle it. But the low half would just be loaded and used by the multiply directly.
      
      After this patch we now end up with a constant pool entry for the low and high unpacks separately with no shuffle operations.
      
      This is a step towards removing custom constant folding for ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG in the X86 backend.
      
      Reviewers: RKSimon, spatel
      
      Reviewed By: RKSimon
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D55165
      
      llvm-svn: 348159
      5440b63f
    • Craig Topper's avatar
      [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that... · e35b01f8
      Craig Topper authored
      [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8.
      
      Summary:
      Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction.
      
      The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations.
      
      Reviewers: RKSimon, spatel
      
      Reviewed By: RKSimon
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D54836
      
      llvm-svn: 348158
      e35b01f8
  13. Dec 02, 2018
  14. Dec 01, 2018
    • Craig Topper's avatar
      [X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1 · f4b13927
      Craig Topper authored
      Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better.
      
      Reviewers: spatel, RKSimon
      
      Reviewed By: RKSimon
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D55138
      
      llvm-svn: 348079
      f4b13927
Loading