Skip to content
  1. Mar 31, 2016
  2. Mar 28, 2016
    • Chuang-Yu Cheng's avatar
      [Power9] Implement new vsx instructions: insert, extract, test data class,... · 80722719
      Chuang-Yu Cheng authored
      [Power9] Implement new vsx instructions: insert, extract, test data class, min/max, reverse, permute, splat
      
      This change implements the following vsx instructions:
      
      - Scalar Insert/Extract
          xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp
      
      - Vector Insert/Extract
          xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp
          xxextractuw xxinsertw
      
      - Scalar/Vector Test Data Class
          xststdcdp xststdcsp xststdcqp
          xvtstdcdp xvtstdcsp
      
      - Maximum/Minimum
          xsmaxcdp xsmaxjdp
          xsmincdp xsminjdp
      
      - Vector Byte-Reverse/Permute/Splat
          xxbrd xxbrh xxbrq xxbrw
          xxperm xxpermr
          xxspltib
      
      30 instructions
      
      Thanks Nemanja for invaluable discussion! Thanks Kit's great help!
      Reviewers: hal, nemanja, kbarton, tjablin, amehsan
      
      http://reviews.llvm.org/D16842
      
      llvm-svn: 264567
      80722719
    • Chuang-Yu Cheng's avatar
      [Power9] Implement new vsx instructions: quad-precision move, fp-arithmetic · 56638489
      Chuang-Yu Cheng authored
      This change implements the following vsx instructions:
      
      - quad-precision move
          xscpsgnqp, xsabsqp, xsnegqp, xsnabsqp
      
      - quad-precision fp-arithmetic
          xsaddqp(o) xsdivqp(o) xsmulqp(o) xssqrtqp(o) xssubqp(o)
          xsmaddqp(o) xsmsubqp(o) xsnmaddqp(o) xsnmsubqp(o)
      
      22 instructions
      
      Thanks Nemanja and Kit for careful review and invaluable discussion!
      Reviewers: hal, nemanja, kbarton, tjablin, amehsan
      
      http://reviews.llvm.org/D16110
      
      llvm-svn: 264565
      56638489
  3. Mar 08, 2016
  4. Feb 26, 2016
    • Kit Barton's avatar
      Power9] Implement new vsx instructions: compare and conversion · 93612ec5
      Kit Barton authored
      This change implements the following vsx instructions:
      
      Quad/Double-Precision Compare:
      xscmpoqp xscmpuqp
      xscmpexpdp xscmpexpqp
      xscmpeqdp xscmpgedp xscmpgtdp xscmpnedp
      xvcmpnedp(.) xvcmpnesp(.)
      Quad-Precision Floating-Point Conversion
      xscvqpdp(o) xscvdpqp
      xscvqpsdz xscvqpswz xscvqpudz xscvqpuwz xscvsdqp xscvudqp
      xscvdphp xscvhpdp xvcvhpsp xvcvsphp
      xsrqpi xsrqpix xsrqpxp
      28 instructions
      
      Phabricator: http://reviews.llvm.org/D16709
      llvm-svn: 262068
      93612ec5
  5. Dec 15, 2015
  6. Dec 11, 2015
    • Matt Arsenault's avatar
      Start replacing vector_extract/vector_insert with extractelt/insertelt · fbd9bbfd
      Matt Arsenault authored
      These are redundant pairs of nodes defined for
      INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT.
      insertelement/extractelement are slightly closer to the corresponding
      C++ node name, and has stricter type checking so prefer it.
      
      Update targets to only use these nodes where it is trivial to do so.
      AArch64, ARM, and Mips all have various type errors on simple replacement,
      so they will need work to fix.
      
      Example from AArch64:
      
      def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8),
                (i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>;
      
      Which is trying to do sext_inreg i8, i8.
      
      llvm-svn: 255359
      fbd9bbfd
  7. Dec 10, 2015
  8. Oct 09, 2015
    • Nemanja Ivanovic's avatar
      Vector element extraction without stack operations on Power 8 · d3896573
      Nemanja Ivanovic authored
      This patch corresponds to review:
      http://reviews.llvm.org/D12032
      
      This patch builds onto the patch that provided scalar to vector conversions
      without stack operations (D11471).
      Included in this patch:
      
          - Vector element extraction for all vector types with constant element number
          - Vector element extraction for v16i8 and v8i16 with variable element number
          - Removal of some unnecessary COPY_TO_REGCLASS operations that ended up
            unnecessarily moving things around between registers
      
      Not included in this patch (will be in upcoming patch):
      
          - Vector element extraction for v4i32, v4f32, v2i64 and v2f64 with
            variable element number
          - Vector element insertion for variable/constant element number
      
      Testing is provided for all extractions. The extractions that are not
      implemented yet are just placeholders.
      
      llvm-svn: 249822
      d3896573
  9. Sep 29, 2015
  10. Aug 31, 2015
    • Hal Finkel's avatar
      [PowerPC] Fixup SELECT_CC (and SETCC) patterns with i1 comparison operands · a2cdbce6
      Hal Finkel authored
      There were really two problems here. The first was that we had the truth tables
      for signed i1 comparisons backward. I imagine these are not very common, but if
      you have:
        setcc i1 x, y, LT
      this has the '0 1' and the '1 0' results flipped compared to:
        setcc i1 x, y, ULT
      because, in the signed case, '1 0' is really '-1 0', and the answer is not the
      same as in the unsigned case.
      
      The second problem was that we did not have patterns (at all) for the unsigned
      comparisons select_cc nodes for i1 comparison operands. This was the specific
      cause of PR24552. These had to be added (and a missing Altivec promotion added
      as well) to make sure these function for all types. I've added a bunch more
      test cases for these patterns, and there are a few FIXMEs in the test case
      regarding code-quality.
      
      Fixes PR24552.
      
      llvm-svn: 246400
      a2cdbce6
  11. Aug 13, 2015
    • Nemanja Ivanovic's avatar
      Scalar to vector conversions using direct moves · 1c39ca65
      Nemanja Ivanovic authored
      This patch corresponds to review:
      http://reviews.llvm.org/D11471
      
      It improves the code generated for converting a scalar to a vector value. With
      direct moves from GPRs to VSRs, we no longer require expensive stack operations
      for this. Subsequent patches will handle the reverse case and more general
      operations between vectors and their scalar elements.
      
      llvm-svn: 244921
      1c39ca65
  12. Jul 14, 2015
  13. Jul 10, 2015
  14. Jul 05, 2015
  15. Jun 26, 2015
  16. May 29, 2015
  17. May 21, 2015
  18. May 07, 2015
  19. May 05, 2015
  20. Apr 27, 2015
    • Bill Schmidt's avatar
      [PPC64LE] Remove unnecessary swaps from lane-insensitive vector computations · fe723b9a
      Bill Schmidt authored
      This patch adds a new SSA MI pass that runs on little-endian PPC64
      code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors
      without alignment constraints are accomplished for little-endian using
      lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional
      xxswapd instructions hurts performance in comparison with big-endian
      code, but they are necessary in the general case to support correct
      semantics.
      
      However, the general case does not apply to most vector code. Many
      vector instructions are lane-insensitive; they do not "care" which
      lanes the parallel computations are performed within, provided that
      the resulting data is stored into the correct locations. Thus this
      pass looks for computations that perform only lane-insensitive
      operations, and remove the unnecessary swaps from loads and stores in
      such computations.
      
      Future improvements will allow computations using certain
      lane-sensitive operations to also be optimized in this manner, by
      modifying the lane-sensitive operations to account for the permuted
      order of the lanes. However, this patch only adds the infrastructure
      to permit this; no lane-sensitive operations are optimized at this
      time.
      
      This code is heavily exercised by the various vectorizing applications
      in the projects/test-suite tree. For the time being, I have only added
      one simple test case to demonstrate what the pass is doing. Although
      it is quite simple, it provides coverage for much of the code,
      including the special case handling of copies and subreg-to-reg
      operations feeding the swaps. I plan to add additional tests in the
      future as I fill in more of the "special handling" code.
      
      Two existing tests were affected, because they expected the swaps to
      be present, but they are now removed.
      
      llvm-svn: 235910
      fe723b9a
  21. Apr 11, 2015
  22. Mar 12, 2015
    • Hal Finkel's avatar
      [PowerPC] Remove canFoldAsLoad from instruction definitions · 6a778fb7
      Hal Finkel authored
      The PowerPC backend had a number of loads that were marked as canFoldAsLoad
      (and I'm partially at fault here for copying around the relevant line of
      TableGen definitions without really looking at what it meant). This is not
      right; PPC (non-memory) instructions don't support direct memory operands, and
      so there is nothing a 'foldable' instruction could be folded into.
      
      Noticed by inspection, no test case.
      
      The one thing we might lose by doing this is ability to fold some loads into
      stackmap/patchpoint pseudo-instructions. However, this was untested, and would
      not obviously have worked for extending loads, and I'd rather re-add support
      for that once it can be tested.
      
      llvm-svn: 231982
      6a778fb7
  23. Feb 18, 2015
  24. Feb 01, 2015
    • Hal Finkel's avatar
      [PowerPC] VSX stores don't also read · e3d2b20c
      Hal Finkel authored
      The VSX store instructions were also picking up an implicit "may read" from the
      default pattern, which was an intrinsic (and we don't currently have a way of
      specifying write-only intrinsics).
      
      This was causing MI verification to fail for VSX spill restores.
      
      llvm-svn: 227759
      e3d2b20c
  25. Dec 09, 2014
    • Bill Schmidt's avatar
      [PowerPC 2/4] Little-endian adjustments for VSX insert/extract operations · 10f6eb91
      Bill Schmidt authored
      For little endian, we need to make some straightforward adjustments in
      the code expansions for scalar_to_vector and vector_extract of v2f64.
      First, scalar_to_vector must place the scalar into vector element
      zero.  However, our implementation of SUBREG_TO_REG will place it into
      big-element vector element zero (high-order bits), and for little
      endian we need it in the low-order bits.  The LE implementation splats
      the high-order doubleword into the low-order doubleword.
      
      Second, the meaning of (vector_extract x, 0) and (vector_extract x, 1)
      must be reversed for similar reasons.
      
      A new test is added that tests code generation for insertelement and
      extractelement for both element 0 and element 1.  It is disabled in
      this patch but enabled in patch 4/4, for reasons stated in the test.
      
      llvm-svn: 223788
      10f6eb91
    • Bill Schmidt's avatar
      [PowerPC 1/4] Little-endian adjustments for VSX loads/stores · fae5d715
      Bill Schmidt authored
      This patch addresses the inherent big-endian bias in the lxvd2x,
      lxvw4x, stxvd2x, and stxvw4x instructions.  These instructions load
      vector elements into registers left-to-right (with the first element
      loaded into the high-order bits of the register), regardless of the
      endian setting of the processor.  However, these are the only
      vector memory instructions that permit unaligned storage accesses, so
      we want to use them for little-endian.
      
      To make this work, a lxvd2x or lxvw4x is replaced with an lxvd2x
      followed by an xxswapd, which swaps the doublewords.  This works for
      lxvw4x as well as lxvd2x, because for lxvw4x on an LE system the
      vector elements are in LE order (right-to-left) within each
      doubleword.  (Thus after lxvw2x of a <4 x float> the elements will
      appear as 1, 0, 3, 2.  Following the swap, they will appear as 3, 2,
      0, 1, as desired.)   For stores, an stxvd2x or stxvw4x is replaced
      with an stxvd2x preceded by an xxswapd.
      
      Introduction of extra swap instructions provides correctness, but
      obviously is not ideal from a performance perspective.  Future patches
      will address this with optimizations to remove most of the introduced
      swaps, which have proven effective in other implementations.
      
      The introduction of the swaps is performed during lowering of LOAD,
      STORE, INTRINSIC_W_CHAIN, and INTRINSIC_VOID operations.  The latter
      are used to translate intrinsics that specify the VSX loads and stores
      directly into equivalent sequences for little endian.  Thus code that
      uses vec_vsx_ld and vec_vsx_st does not have to be modified to be
      ported from BE to LE.
      
      We introduce new PPCISD opcodes for LXVD2X, STXVD2X, and XXSWAPD for
      use during this lowering step.  In PPCInstrVSX.td, we add new SDType
      and SDNode definitions for these (PPClxvd2x, PPCstxvd2x, PPCxxswapd).
      These are recognized during instruction selection and mapped to the
      correct instructions.
      
      Several tests that were written to use -mcpu=pwr7 or pwr8 are modified
      to disable VSX on LE variants because code generation changes with
      this and subsequent patches in this set.  I chose to include all of
      these in the first patch than try to rigorously sort out which tests
      were broken by one or another of the patches.  Sorry about that.
      
      The new test vsx-ldst-builtin-le.ll, and the changes to vsx-ldst.ll,
      are disabled until LE support is enabled because of breakages that
      occur as noted in those tests.  They are re-enabled in patch 4/4.
      
      llvm-svn: 223783
      fae5d715
  26. Nov 26, 2014
  27. Nov 14, 2014
  28. Nov 12, 2014
    • Bill Schmidt's avatar
      [PowerPC] Add vec_vsx_ld and vec_vsx_st intrinsics · 72954784
      Bill Schmidt authored
      This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for
      PowerPC, which provide programmer access to the lxvd2x, lxvw4x,
      stxvd2x, and stxvw4x instructions.
      
      New LLVM intrinsics are provided to represent these four instructions
      in IntrinsicsPowerPC.td.  These are patterned after the similar
      intrinsics for lvx and stvx (Altivec).  In PPCInstrVSX.td, these
      intrinsics are tied to the code gen patterns, with additional patterns
      to allow plain vanilla loads and stores to still generate these
      instructions.
      
      At -O1 and higher the intrinsics are immediately converted to loads
      and stores in InstCombineCalls.cpp.  This will open up more
      optimization opportunities while still allowing the correct
      instructions to be generated.  (Similar code exists for aligned
      Altivec loads and stores.)
      
      The new intrinsics are added to the code that checks for consecutive
      loads and stores in PPCISelLowering.cpp, as well as to
      PPCTargetLowering::getTgtMemIntrinsic().
      
      There's a new test to verify the correct instructions are generated.
      The loads and stores tend to be reordered, so the test just counts
      their number.  It runs at -O2, as it's not very effective to test this
      at -O0, when many unnecessary loads and stores are generated.
      
      I ended up having to modify vsx-fma-m.ll.  It turns out this test case
      is slightly unreliable, but I don't know a good way to prevent
      problems with it.  The xvmaddmdp instructions read and write the same
      register, which is one of the multiplicands.  Commutativity allows
      either to be chosen.  If the FMAs are reordered differently than
      expected by the test, the register assignment can be different as a
      result.  Hopefully this doesn't change often.
      
      There is a companion patch for Clang.
      
      llvm-svn: 221767
      72954784
  29. Oct 31, 2014
    • Bill Schmidt's avatar
      [PowerPC] Initial VSX intrinsic support, with min/max for vector double · 1ca69fa6
      Bill Schmidt authored
      Now that we have initial support for VSX, we can begin adding
      intrinsics for programmer access to VSX instructions.  This patch adds
      basic support for VSX intrinsics in general, and tests it by
      implementing intrinsics for minimum and maximum for the vector double
      data type.
      
      The LLVM portion of this is quite straightforward.  There is a
      companion patch for Clang.
      
      llvm-svn: 220988
      1ca69fa6
  30. Oct 22, 2014
    • Bill Schmidt's avatar
      [PATCH] Support select-cc for VSFRC when VSX is enabled · 9c54bbd7
      Bill Schmidt authored
      A previous patch enabled SELECT_VSRC and SELECT_CC_VSRC for VSX to
      handle <2 x double> cases.  This patch adds SELECT_VSFRC and
      SELECT_CC_VSFRC to allow use of all 64 vector-scalar registers for the
      f64 type when VSX is enabled.  The changes are analogous to those in
      the previous patch.  I've added a new variant to vsx.ll to test the
      code generation.
      
      (I also cleaned up a little formatting in PPCInstrVSX.td from the
      previous patch.)
      
      llvm-svn: 220395
      9c54bbd7
    • Bill Schmidt's avatar
      [PowerPC] Support select-cc for VSX · 61e65233
      Bill Schmidt authored
      The tests test/CodeGen/Generic/select-cc.ll and
      test/CodeGen/PowerPC/select-cc.ll both fail with VSX enabled.  The
      problem is that the lowering logic for the SELECT and SELECT_CC
      operations doesn't currently support the VSX registers.  This patch
      fixes that.
      
      In lib/Target/PowerPC/PPCInstrInfo.td, we have pseudos to handle this
      for other register classes.  Similar pseudos are added in
      PPCInstrVSX.td (they must be there, because the "vsrc" register class
      definition appears there) for the VSRC register class.  The
      SELECT_VSRC pseudo is then used in pattern matching for SELECT_CC.
      
      The rest of the patch just adds logic for SELECT_VSRC wherever similar
      logic appears for SELECT_VRRC.
      
      There are no new test cases because the existing tests above test
      this, along with a variant in test/CodeGen/PowerPC/vsx.ll.
      
      After discussion with Hal, a future patch will add similar _VSFRC
      variants to override f64 type handling (currently using F8RC).
      
      llvm-svn: 220385
      61e65233
  31. Oct 17, 2014
    • Bill Schmidt's avatar
      [PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generation · 2d1128ac
      Bill Schmidt authored
      Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64
      types, but does not yet use lxvw4x and stxvw4x for 4x32 types.  This
      patch adds that support.
      
      As with lxvd2x/stxvd2x, this involves straightforward overriding of
      the patterns normally recognized for lvx/stvx, with preference given
      to the VSX patterns when VSX is enabled.
      
      In addition, the logic for permitting misaligned memory accesses is
      modified so that v4r32 and v4i32 are treated the same as v2f64 and
      v2i64 when VSX is enabled.  Finally, the DAG generation for unaligned
      loads is changed to just use a normal LOAD (which will become lxvw4x)
      on P8 and later hardware, where unaligned loads are preferred over
      lvsl/lvx/lvx/vperm.
      
      A number of tests now generate the VSX loads/stores instead of
      lvx/stvx, so this patch adds VSX variants to those tests.  I've also
      added <4 x float> tests to the vsx.ll test case, and created a
      vsx-p8.ll test case to be used for testing code generation for the
      P8Vector feature.  For now, that simply tests the unaligned load/store
      behavior.
      
      This has been tested along with a temporary patch to enable the VSX
      and P8Vector features, with no new regressions encountered with or
      without the temporary patch applied.
      
      llvm-svn: 220047
      2d1128ac
  32. Oct 09, 2014
    • Bill Schmidt's avatar
      [PPC64] VSX indexed-form loads use wrong instruction format · cb34fd09
      Bill Schmidt authored
      The VSX instruction definitions for lxsdx, lxvd2x, lxvdsx, and lxvw4x
      incorrectly use the XForm_1 instruction format, rather than the
      XX1Form instruction format.  This is likely a pasto when creating
      these instructions, which were based on lvx and so forth.  This patch
      uses the correct format.
      
      The existing reformatting test (test/MC/PowerPC/vsx.s) missed this
      because the two formats differ only in that XX1Form has an extension
      to the target register field in bit 31.  The tests for these
      instructions used a target register of 7, so the default of 0 in bit
      31 for XForm_1 didn't expose a problem.  For register numbers 32-63
      this would be noticeable.  I've changed the test to use higher
      register numbers to verify my change is effective.
      
      llvm-svn: 219416
      cb34fd09
  33. May 22, 2014
  34. Apr 01, 2014
  35. Mar 30, 2014
    • Hal Finkel's avatar
      [PowerPC] Handle VSX v2i64 SIGN_EXTEND_INREG · 5c0d1454
      Hal Finkel authored
      sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node
      (and similarly for v2i16 and v2i8). Even though there are no sign-extension (or
      algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by
      converting two and from v2i64. The small trick necessary here is to shift the
      i32 elements into the right lanes before the i32 -> f64 step. This is because
      of the big Endian nature of the system, we need the i32 portion in the high
      word of the i64 elements.
      
      For v2i16 and v2i8 we can do the same, but we first use the default Altivec
      shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and
      then apply the above procedure.
      
      llvm-svn: 205146
      5c0d1454
  36. Mar 29, 2014
    • Hal Finkel's avatar
      [PowerPC] VSX instruction latency corrections · e8fba987
      Hal Finkel authored
      The vector divide and sqrt instructions have high latencies, and the scalar
      comparisons are like all of the others. On the P7, permutations take an extra
      cycle over purely-simple vector ops.
      
      llvm-svn: 205096
      e8fba987
Loading