Skip to content
  1. Aug 18, 2016
  2. Jul 18, 2016
  3. Jul 12, 2016
  4. Jul 05, 2016
    • Nemanja Ivanovic's avatar
      [PowerPC] - Legalize vector types by widening instead of integer promotion · 44513e54
      Nemanja Ivanovic authored
      This patch corresponds to review:
      http://reviews.llvm.org/D20443
      
      It changes the legalization strategy for illegal vector types from integer
      promotion to widening. This only applies for vectors with elements of width
      that is a multiple of a byte since we have hardware support for vectors with
      1, 2, 3, 8 and 16 byte elements.
      Integer promotion for vectors is quite expensive on PPC due to the sequence
      of breaking apart the vector, extending the elements and reconstituting the
      vector. Two of these operations are expensive.
      This patch causes between minor and major improvements in performance on most
      benchmarks. There are very few benchmarks whose performance regresses. These
      regressions can be handled in a subsequent patch with a DAG combine (similar
      to how this patch handles int -> fp conversions of illegal vector types).
      
      llvm-svn: 274535
      44513e54
  5. May 04, 2016
  6. Mar 31, 2016
  7. Mar 28, 2016
    • Chuang-Yu Cheng's avatar
      [Power9] Implement new vsx instructions: insert, extract, test data class,... · 80722719
      Chuang-Yu Cheng authored
      [Power9] Implement new vsx instructions: insert, extract, test data class, min/max, reverse, permute, splat
      
      This change implements the following vsx instructions:
      
      - Scalar Insert/Extract
          xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp
      
      - Vector Insert/Extract
          xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp
          xxextractuw xxinsertw
      
      - Scalar/Vector Test Data Class
          xststdcdp xststdcsp xststdcqp
          xvtstdcdp xvtstdcsp
      
      - Maximum/Minimum
          xsmaxcdp xsmaxjdp
          xsmincdp xsminjdp
      
      - Vector Byte-Reverse/Permute/Splat
          xxbrd xxbrh xxbrq xxbrw
          xxperm xxpermr
          xxspltib
      
      30 instructions
      
      Thanks Nemanja for invaluable discussion! Thanks Kit's great help!
      Reviewers: hal, nemanja, kbarton, tjablin, amehsan
      
      http://reviews.llvm.org/D16842
      
      llvm-svn: 264567
      80722719
    • Chuang-Yu Cheng's avatar
      [Power9] Implement new vsx instructions: quad-precision move, fp-arithmetic · 56638489
      Chuang-Yu Cheng authored
      This change implements the following vsx instructions:
      
      - quad-precision move
          xscpsgnqp, xsabsqp, xsnegqp, xsnabsqp
      
      - quad-precision fp-arithmetic
          xsaddqp(o) xsdivqp(o) xsmulqp(o) xssqrtqp(o) xssubqp(o)
          xsmaddqp(o) xsmsubqp(o) xsnmaddqp(o) xsnmsubqp(o)
      
      22 instructions
      
      Thanks Nemanja and Kit for careful review and invaluable discussion!
      Reviewers: hal, nemanja, kbarton, tjablin, amehsan
      
      http://reviews.llvm.org/D16110
      
      llvm-svn: 264565
      56638489
  8. Mar 08, 2016
  9. Feb 26, 2016
    • Kit Barton's avatar
      Power9] Implement new vsx instructions: compare and conversion · 93612ec5
      Kit Barton authored
      This change implements the following vsx instructions:
      
      Quad/Double-Precision Compare:
      xscmpoqp xscmpuqp
      xscmpexpdp xscmpexpqp
      xscmpeqdp xscmpgedp xscmpgtdp xscmpnedp
      xvcmpnedp(.) xvcmpnesp(.)
      Quad-Precision Floating-Point Conversion
      xscvqpdp(o) xscvdpqp
      xscvqpsdz xscvqpswz xscvqpudz xscvqpuwz xscvsdqp xscvudqp
      xscvdphp xscvhpdp xvcvhpsp xvcvsphp
      xsrqpi xsrqpix xsrqpxp
      28 instructions
      
      Phabricator: http://reviews.llvm.org/D16709
      llvm-svn: 262068
      93612ec5
  10. Dec 15, 2015
  11. Dec 11, 2015
    • Matt Arsenault's avatar
      Start replacing vector_extract/vector_insert with extractelt/insertelt · fbd9bbfd
      Matt Arsenault authored
      These are redundant pairs of nodes defined for
      INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT.
      insertelement/extractelement are slightly closer to the corresponding
      C++ node name, and has stricter type checking so prefer it.
      
      Update targets to only use these nodes where it is trivial to do so.
      AArch64, ARM, and Mips all have various type errors on simple replacement,
      so they will need work to fix.
      
      Example from AArch64:
      
      def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8),
                (i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>;
      
      Which is trying to do sext_inreg i8, i8.
      
      llvm-svn: 255359
      fbd9bbfd
  12. Dec 10, 2015
  13. Oct 09, 2015
    • Nemanja Ivanovic's avatar
      Vector element extraction without stack operations on Power 8 · d3896573
      Nemanja Ivanovic authored
      This patch corresponds to review:
      http://reviews.llvm.org/D12032
      
      This patch builds onto the patch that provided scalar to vector conversions
      without stack operations (D11471).
      Included in this patch:
      
          - Vector element extraction for all vector types with constant element number
          - Vector element extraction for v16i8 and v8i16 with variable element number
          - Removal of some unnecessary COPY_TO_REGCLASS operations that ended up
            unnecessarily moving things around between registers
      
      Not included in this patch (will be in upcoming patch):
      
          - Vector element extraction for v4i32, v4f32, v2i64 and v2f64 with
            variable element number
          - Vector element insertion for variable/constant element number
      
      Testing is provided for all extractions. The extractions that are not
      implemented yet are just placeholders.
      
      llvm-svn: 249822
      d3896573
  14. Sep 29, 2015
  15. Aug 31, 2015
    • Hal Finkel's avatar
      [PowerPC] Fixup SELECT_CC (and SETCC) patterns with i1 comparison operands · a2cdbce6
      Hal Finkel authored
      There were really two problems here. The first was that we had the truth tables
      for signed i1 comparisons backward. I imagine these are not very common, but if
      you have:
        setcc i1 x, y, LT
      this has the '0 1' and the '1 0' results flipped compared to:
        setcc i1 x, y, ULT
      because, in the signed case, '1 0' is really '-1 0', and the answer is not the
      same as in the unsigned case.
      
      The second problem was that we did not have patterns (at all) for the unsigned
      comparisons select_cc nodes for i1 comparison operands. This was the specific
      cause of PR24552. These had to be added (and a missing Altivec promotion added
      as well) to make sure these function for all types. I've added a bunch more
      test cases for these patterns, and there are a few FIXMEs in the test case
      regarding code-quality.
      
      Fixes PR24552.
      
      llvm-svn: 246400
      a2cdbce6
  16. Aug 13, 2015
    • Nemanja Ivanovic's avatar
      Scalar to vector conversions using direct moves · 1c39ca65
      Nemanja Ivanovic authored
      This patch corresponds to review:
      http://reviews.llvm.org/D11471
      
      It improves the code generated for converting a scalar to a vector value. With
      direct moves from GPRs to VSRs, we no longer require expensive stack operations
      for this. Subsequent patches will handle the reverse case and more general
      operations between vectors and their scalar elements.
      
      llvm-svn: 244921
      1c39ca65
  17. Jul 14, 2015
  18. Jul 10, 2015
  19. Jul 05, 2015
  20. Jun 26, 2015
  21. May 29, 2015
  22. May 21, 2015
  23. May 07, 2015
  24. May 05, 2015
  25. Apr 27, 2015
    • Bill Schmidt's avatar
      [PPC64LE] Remove unnecessary swaps from lane-insensitive vector computations · fe723b9a
      Bill Schmidt authored
      This patch adds a new SSA MI pass that runs on little-endian PPC64
      code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors
      without alignment constraints are accomplished for little-endian using
      lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional
      xxswapd instructions hurts performance in comparison with big-endian
      code, but they are necessary in the general case to support correct
      semantics.
      
      However, the general case does not apply to most vector code. Many
      vector instructions are lane-insensitive; they do not "care" which
      lanes the parallel computations are performed within, provided that
      the resulting data is stored into the correct locations. Thus this
      pass looks for computations that perform only lane-insensitive
      operations, and remove the unnecessary swaps from loads and stores in
      such computations.
      
      Future improvements will allow computations using certain
      lane-sensitive operations to also be optimized in this manner, by
      modifying the lane-sensitive operations to account for the permuted
      order of the lanes. However, this patch only adds the infrastructure
      to permit this; no lane-sensitive operations are optimized at this
      time.
      
      This code is heavily exercised by the various vectorizing applications
      in the projects/test-suite tree. For the time being, I have only added
      one simple test case to demonstrate what the pass is doing. Although
      it is quite simple, it provides coverage for much of the code,
      including the special case handling of copies and subreg-to-reg
      operations feeding the swaps. I plan to add additional tests in the
      future as I fill in more of the "special handling" code.
      
      Two existing tests were affected, because they expected the swaps to
      be present, but they are now removed.
      
      llvm-svn: 235910
      fe723b9a
  26. Apr 11, 2015
  27. Mar 12, 2015
    • Hal Finkel's avatar
      [PowerPC] Remove canFoldAsLoad from instruction definitions · 6a778fb7
      Hal Finkel authored
      The PowerPC backend had a number of loads that were marked as canFoldAsLoad
      (and I'm partially at fault here for copying around the relevant line of
      TableGen definitions without really looking at what it meant). This is not
      right; PPC (non-memory) instructions don't support direct memory operands, and
      so there is nothing a 'foldable' instruction could be folded into.
      
      Noticed by inspection, no test case.
      
      The one thing we might lose by doing this is ability to fold some loads into
      stackmap/patchpoint pseudo-instructions. However, this was untested, and would
      not obviously have worked for extending loads, and I'd rather re-add support
      for that once it can be tested.
      
      llvm-svn: 231982
      6a778fb7
  28. Feb 18, 2015
  29. Feb 01, 2015
    • Hal Finkel's avatar
      [PowerPC] VSX stores don't also read · e3d2b20c
      Hal Finkel authored
      The VSX store instructions were also picking up an implicit "may read" from the
      default pattern, which was an intrinsic (and we don't currently have a way of
      specifying write-only intrinsics).
      
      This was causing MI verification to fail for VSX spill restores.
      
      llvm-svn: 227759
      e3d2b20c
  30. Dec 09, 2014
    • Bill Schmidt's avatar
      [PowerPC 2/4] Little-endian adjustments for VSX insert/extract operations · 10f6eb91
      Bill Schmidt authored
      For little endian, we need to make some straightforward adjustments in
      the code expansions for scalar_to_vector and vector_extract of v2f64.
      First, scalar_to_vector must place the scalar into vector element
      zero.  However, our implementation of SUBREG_TO_REG will place it into
      big-element vector element zero (high-order bits), and for little
      endian we need it in the low-order bits.  The LE implementation splats
      the high-order doubleword into the low-order doubleword.
      
      Second, the meaning of (vector_extract x, 0) and (vector_extract x, 1)
      must be reversed for similar reasons.
      
      A new test is added that tests code generation for insertelement and
      extractelement for both element 0 and element 1.  It is disabled in
      this patch but enabled in patch 4/4, for reasons stated in the test.
      
      llvm-svn: 223788
      10f6eb91
    • Bill Schmidt's avatar
      [PowerPC 1/4] Little-endian adjustments for VSX loads/stores · fae5d715
      Bill Schmidt authored
      This patch addresses the inherent big-endian bias in the lxvd2x,
      lxvw4x, stxvd2x, and stxvw4x instructions.  These instructions load
      vector elements into registers left-to-right (with the first element
      loaded into the high-order bits of the register), regardless of the
      endian setting of the processor.  However, these are the only
      vector memory instructions that permit unaligned storage accesses, so
      we want to use them for little-endian.
      
      To make this work, a lxvd2x or lxvw4x is replaced with an lxvd2x
      followed by an xxswapd, which swaps the doublewords.  This works for
      lxvw4x as well as lxvd2x, because for lxvw4x on an LE system the
      vector elements are in LE order (right-to-left) within each
      doubleword.  (Thus after lxvw2x of a <4 x float> the elements will
      appear as 1, 0, 3, 2.  Following the swap, they will appear as 3, 2,
      0, 1, as desired.)   For stores, an stxvd2x or stxvw4x is replaced
      with an stxvd2x preceded by an xxswapd.
      
      Introduction of extra swap instructions provides correctness, but
      obviously is not ideal from a performance perspective.  Future patches
      will address this with optimizations to remove most of the introduced
      swaps, which have proven effective in other implementations.
      
      The introduction of the swaps is performed during lowering of LOAD,
      STORE, INTRINSIC_W_CHAIN, and INTRINSIC_VOID operations.  The latter
      are used to translate intrinsics that specify the VSX loads and stores
      directly into equivalent sequences for little endian.  Thus code that
      uses vec_vsx_ld and vec_vsx_st does not have to be modified to be
      ported from BE to LE.
      
      We introduce new PPCISD opcodes for LXVD2X, STXVD2X, and XXSWAPD for
      use during this lowering step.  In PPCInstrVSX.td, we add new SDType
      and SDNode definitions for these (PPClxvd2x, PPCstxvd2x, PPCxxswapd).
      These are recognized during instruction selection and mapped to the
      correct instructions.
      
      Several tests that were written to use -mcpu=pwr7 or pwr8 are modified
      to disable VSX on LE variants because code generation changes with
      this and subsequent patches in this set.  I chose to include all of
      these in the first patch than try to rigorously sort out which tests
      were broken by one or another of the patches.  Sorry about that.
      
      The new test vsx-ldst-builtin-le.ll, and the changes to vsx-ldst.ll,
      are disabled until LE support is enabled because of breakages that
      occur as noted in those tests.  They are re-enabled in patch 4/4.
      
      llvm-svn: 223783
      fae5d715
  31. Nov 26, 2014
  32. Nov 14, 2014
  33. Nov 12, 2014
    • Bill Schmidt's avatar
      [PowerPC] Add vec_vsx_ld and vec_vsx_st intrinsics · 72954784
      Bill Schmidt authored
      This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for
      PowerPC, which provide programmer access to the lxvd2x, lxvw4x,
      stxvd2x, and stxvw4x instructions.
      
      New LLVM intrinsics are provided to represent these four instructions
      in IntrinsicsPowerPC.td.  These are patterned after the similar
      intrinsics for lvx and stvx (Altivec).  In PPCInstrVSX.td, these
      intrinsics are tied to the code gen patterns, with additional patterns
      to allow plain vanilla loads and stores to still generate these
      instructions.
      
      At -O1 and higher the intrinsics are immediately converted to loads
      and stores in InstCombineCalls.cpp.  This will open up more
      optimization opportunities while still allowing the correct
      instructions to be generated.  (Similar code exists for aligned
      Altivec loads and stores.)
      
      The new intrinsics are added to the code that checks for consecutive
      loads and stores in PPCISelLowering.cpp, as well as to
      PPCTargetLowering::getTgtMemIntrinsic().
      
      There's a new test to verify the correct instructions are generated.
      The loads and stores tend to be reordered, so the test just counts
      their number.  It runs at -O2, as it's not very effective to test this
      at -O0, when many unnecessary loads and stores are generated.
      
      I ended up having to modify vsx-fma-m.ll.  It turns out this test case
      is slightly unreliable, but I don't know a good way to prevent
      problems with it.  The xvmaddmdp instructions read and write the same
      register, which is one of the multiplicands.  Commutativity allows
      either to be chosen.  If the FMAs are reordered differently than
      expected by the test, the register assignment can be different as a
      result.  Hopefully this doesn't change often.
      
      There is a companion patch for Clang.
      
      llvm-svn: 221767
      72954784
  34. Oct 31, 2014
    • Bill Schmidt's avatar
      [PowerPC] Initial VSX intrinsic support, with min/max for vector double · 1ca69fa6
      Bill Schmidt authored
      Now that we have initial support for VSX, we can begin adding
      intrinsics for programmer access to VSX instructions.  This patch adds
      basic support for VSX intrinsics in general, and tests it by
      implementing intrinsics for minimum and maximum for the vector double
      data type.
      
      The LLVM portion of this is quite straightforward.  There is a
      companion patch for Clang.
      
      llvm-svn: 220988
      1ca69fa6
  35. Oct 22, 2014
    • Bill Schmidt's avatar
      [PATCH] Support select-cc for VSFRC when VSX is enabled · 9c54bbd7
      Bill Schmidt authored
      A previous patch enabled SELECT_VSRC and SELECT_CC_VSRC for VSX to
      handle <2 x double> cases.  This patch adds SELECT_VSFRC and
      SELECT_CC_VSFRC to allow use of all 64 vector-scalar registers for the
      f64 type when VSX is enabled.  The changes are analogous to those in
      the previous patch.  I've added a new variant to vsx.ll to test the
      code generation.
      
      (I also cleaned up a little formatting in PPCInstrVSX.td from the
      previous patch.)
      
      llvm-svn: 220395
      9c54bbd7
    • Bill Schmidt's avatar
      [PowerPC] Support select-cc for VSX · 61e65233
      Bill Schmidt authored
      The tests test/CodeGen/Generic/select-cc.ll and
      test/CodeGen/PowerPC/select-cc.ll both fail with VSX enabled.  The
      problem is that the lowering logic for the SELECT and SELECT_CC
      operations doesn't currently support the VSX registers.  This patch
      fixes that.
      
      In lib/Target/PowerPC/PPCInstrInfo.td, we have pseudos to handle this
      for other register classes.  Similar pseudos are added in
      PPCInstrVSX.td (they must be there, because the "vsrc" register class
      definition appears there) for the VSRC register class.  The
      SELECT_VSRC pseudo is then used in pattern matching for SELECT_CC.
      
      The rest of the patch just adds logic for SELECT_VSRC wherever similar
      logic appears for SELECT_VRRC.
      
      There are no new test cases because the existing tests above test
      this, along with a variant in test/CodeGen/PowerPC/vsx.ll.
      
      After discussion with Hal, a future patch will add similar _VSFRC
      variants to override f64 type handling (currently using F8RC).
      
      llvm-svn: 220385
      61e65233
Loading