Skip to content
  1. Nov 19, 2018
    • Craig Topper's avatar
      [X86] Use compare with 0 to fill an element with sign bits when sign extending to v2i64 pre-sse4.1 · 36168910
      Craig Topper authored
      Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter.
      
      llvm-svn: 347181
      36168910
    • Craig Topper's avatar
      [X86] Remove most of the SEXTLOAD Custom setOperationAction calls under... · 053f1eea
      Craig Topper authored
      [X86] Remove most of the SEXTLOAD Custom setOperationAction calls under -x86-experimental-vector-widening-legalization.
      
      Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts.
      
      llvm-svn: 347180
      053f1eea
  2. Nov 18, 2018
    • Simon Pilgrim's avatar
    • Craig Topper's avatar
      [X86] Add custom type legalization for extending v4i8/v4i16->v4i64. · 0468c860
      Craig Topper authored
      Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result.
      
      When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq.
      
      llvm-svn: 347176
      0468c860
    • Simon Pilgrim's avatar
      [X86][SSE] Add SimplifyDemandedVectorElts support for SSE splat-vector-shifts. · b31bdbd2
      Simon Pilgrim authored
      SSE vector shifts only use the bottom 64-bits of the shift amount vector.
      
      llvm-svn: 347173
      b31bdbd2
    • Craig Topper's avatar
      [X86] Disable combineToExtendVectorInReg under... · 11d50948
      Craig Topper authored
      [X86] Disable combineToExtendVectorInReg under -x86-experimental-vector-widening-legalization. Add custom type legalization for extends.
      
      If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats.
      
      This patch disables combineToExtendVectorInReg when we are using widening.
      
      I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346.
      
      I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend.
      
      llvm-svn: 347172
      11d50948
    • Craig Topper's avatar
      [X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an... · bc8148f7
      Craig Topper authored
      [X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an extract_subvector, and a packuswb instruction.
      
      Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise.
      
      Reviewers: RKSimon, spatel
      
      Reviewed By: RKSimon
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D54671
      
      llvm-svn: 347171
      bc8148f7
    • Sanjay Patel's avatar
      [DAG] add undef simplifications for select nodes · 8c0cd77b
      Sanjay Patel authored
      Sadly, this duplicates (twice) the logic from InstSimplify. There
      might be some way to at least share the DAG versions of the code, 
      but copying the folds seems to be the standard method to ensure 
      that we don't miss these folds. 
      
      Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no 
      way to ensure that we do these kinds of simplifications unless the 
      code is repeated at node creation time and during combines.
      
      There were other tests that would become worthless with this
      improvement that I changed as pre-commits:
      rL347161
      rL347164
      rL347165
      rL347166
      rL347167
      
      I'm not sure how to salvage the remaining tests (diffs in this patch).
      So the x86 tests verify that the new code is working as intended.
      The AMDGPU test is actually similar to my motivating case: we have
      some undef value that has survived to machine IR in an x86 test, and 
      then it gets folded in some weird way, or we crash if we don't transfer
      the undef flag. But we would have been better off never getting to that
      point by doing these simplifications.
      
      This will lead back to PR32023 someday...
      https://bugs.llvm.org/show_bug.cgi?id=32023
      
      llvm-svn: 347170
      8c0cd77b
    • Simon Pilgrim's avatar
      Remove unused variable. NFCI. · ec808cf5
      Simon Pilgrim authored
      llvm-svn: 347169
      ec808cf5
    • Simon Pilgrim's avatar
      [X86][SSE] Split IsSplatValue into GetSplatValue and IsSplatVector · 50828c75
      Simon Pilgrim authored
      Refactor towards making this recursive (necessary for PR38243 rotation splat detection).
      IsSplatVector returns the original vector source of the splat and the splat index.
      GetSplatValue returns the scalar splatted value as an extraction from IsSplatVector.
      
      llvm-svn: 347168
      50828c75
    • Simon Pilgrim's avatar
      [X86][SSE] Relax IsSplatValue - remove the 'variable shift' limit on subtracts. · fec9f865
      Simon Pilgrim authored
      Means we don't use the per-lane-shifts as much when we can cheaply use the older splat-variable-shifts.
      
      llvm-svn: 347162
      fec9f865
    • Sanjay Patel's avatar
      [SelectionDAG] simplify code; NFC · 42c22a1f
      Sanjay Patel authored
      llvm-svn: 347160
      42c22a1f
    • Simon Pilgrim's avatar
      [X86][SSE] Use raw shuffle mask decode in SimplifyDemandedVectorEltsForTargetNode (PR39549) · cc1f5d24
      Simon Pilgrim authored
      We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs.
      
      llvm-svn: 347158
      cc1f5d24
    • Heejin Ahn's avatar
      [WebAssembly] Add null streamer support · e0f8b9bf
      Heejin Ahn authored
      Summary: Now `llc -filetype=null` works.
      
      Reviewers: eush
      
      Subscribers: dschuff, jgravelle-google, sbc100, sunfish, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D54660
      
      llvm-svn: 347155
      e0f8b9bf
    • Craig Topper's avatar
      [X86] Add -x86-experimental-vector-widening-legalization check to... · cd94a7c2
      Craig Topper authored
      [X86] Add -x86-experimental-vector-widening-legalization check to combineSelect and combineSetCC to cover vXi16/vXi8 promotion without BWI.
      
      I don't yet have any test cases for this, but its the right thing to do based on log file inspection.
      
      llvm-svn: 347151
      cd94a7c2
    • Craig Topper's avatar
      [X86] Rename WidenMaskArithmetic->PromoteMaskArithmetic since we usually use... · b03f80a2
      Craig Topper authored
      [X86] Rename WidenMaskArithmetic->PromoteMaskArithmetic since we usually use widen to refer to adding elements not making elements larger. NFC
      
      llvm-svn: 347150
      b03f80a2
    • Craig Topper's avatar
      [X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends... · f56a5751
      Craig Topper authored
      [X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends from i8 or smaller without SSE4.1. Prefer to shrink the mul instead.
      
      The zero extend will require two stages of unpacks to implement. So its better to shrink the multiply using pmullw and then extend that result back to v4i32 using a single unpack.
      
      llvm-svn: 347149
      f56a5751
    • Vedant Kumar's avatar
      [CorrelatedValuePropagation] Preserve debug locations (PR38178) · 35f504c1
      Vedant Kumar authored
      Fix all of the missing debug location errors in CVP found by debugify.
      
      This includes the missing-location-after-udiv-truncation case described
      in llvm.org/PR38178.
      
      llvm-svn: 347147
      35f504c1
  3. Nov 17, 2018
  4. Nov 16, 2018
Loading