Skip to content
  1. Mar 23, 2017
  2. Mar 22, 2017
  3. Mar 21, 2017
    • Matt Arsenault's avatar
      AMDGPU: Rename SI_RETURN · 5b20fbb7
      Matt Arsenault authored
      This is used for a specific type of return to a shader part's
      epilog code. Rename to try avoiding confusion from a true
      call's return.
      
      llvm-svn: 298452
      5b20fbb7
    • Matthias Braun's avatar
      SplitKit: Fix subreg copy related problems · 8445cbd1
      Matthias Braun authored
      Fix two problems related to r298025:
      - SplitKit would create duplicate VNIs in some cases leading to crashs
        when hoisting copies.
      - VirtRegMap could fail expanding copies at the beginning of a basic
        block.
      
      This fixes http://llvm.org/PR32353
      
      llvm-svn: 298448
      8445cbd1
    • Matt Arsenault's avatar
      AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel · 3dbeefa9
      Matt Arsenault authored
      Currently the default C calling convention functions are treated
      the same as compute kernels. Make this explicit so the default
      calling convention can be changed to a non-kernel.
      
      Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
      on the relevant test directories (and undoing in one place that actually
      wanted a non-kernel).
      
      llvm-svn: 298444
      3dbeefa9
    • George Burgess IV's avatar
      Let llvm.objectsize be conservative with null pointers · 56c7e88c
      George Burgess IV authored
      This adds a parameter to @llvm.objectsize that makes it return
      conservative values if it's given null.
      
      This fixes PR23277.
      
      Differential Revision: https://reviews.llvm.org/D28494
      
      llvm-svn: 298430
      56c7e88c
    • Marek Olsak's avatar
      AMDGPU: Buffer descriptor changes for GFX9 · 5c7a61d2
      Marek Olsak authored
      Reviewers: arsenm
      
      Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr
      
      Differential Revision: https://reviews.llvm.org/D31158
      
      llvm-svn: 298397
      5c7a61d2
    • Marek Olsak's avatar
      AMDGPU: Always use VGPR indexing on GFX9 · e22fdb9c
      Marek Olsak authored
      Reviewers: arsenm
      
      Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr
      
      Differential Revision: https://reviews.llvm.org/D31157
      
      llvm-svn: 298396
      e22fdb9c
    • Matt Arsenault's avatar
      AMDGPU: Fix asserting on 0 dmask for image intrinsics · f8fb605a
      Matt Arsenault authored
      Fold these to undef during lowering so users get eliminated.
      
      llvm-svn: 298387
      f8fb605a
    • Matt Arsenault's avatar
      AMDGPU: Convert image intrinsic uses in tests · 964a8485
      Matt Arsenault authored
      llvm-svn: 298386
      964a8485
    • Matt Arsenault's avatar
      DAG: Fold bitcast/extract_vector_elt of undef to undef · dce313c3
      Matt Arsenault authored
      Fixes not eliminating store when intrinsic is lowered to undef.
      
      llvm-svn: 298385
      dce313c3
    • Valery Pykhtin's avatar
      fd4c410f
    • Sam Kolton's avatar
      [ADMGPU] SDWA peephole optimization pass. · f60ad58d
      Sam Kolton authored
      Summary:
      First iteration of SDWA peephole.
      
      This pass tries to combine several instruction into one SDWA instruction. E.g. it converts:
      '''
          V_LSHRREV_B32_e32 %vreg0, 16, %vreg1
          V_ADD_I32_e32 %vreg2, %vreg0, %vreg3
          V_LSHLREV_B32_e32 %vreg4, 16, %vreg2
      '''
      Into:
      '''
         V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
      '''
      
      Pass structure:
          1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''.
          2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0'''
          3. Iterate over all potential instructions and check if they can be converted into SDWA.
          4. Convert instructions to SDWA.
      
      This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done).
      There are several ways this pass can be improved:
          1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass.
          2. Introduce more SDWA patterns
          3. Introduce mnemonics to limit when SDWA patterns should apply
      
      Reviewers: vpykhtin, alex-t, arsenm, rampitec
      
      Subscribers: wdng, nhaehnle, mgorny
      
      Differential Revision: https://reviews.llvm.org/D30038
      
      llvm-svn: 298365
      f60ad58d
  4. Mar 20, 2017
  5. Mar 19, 2017
    • Ahmed Bougacha's avatar
      [GlobalISel] Don't select trivially dead instructions. · 931904d7
      Ahmed Bougacha authored
      Folding instructions when selecting can cause them to become dead.
      Don't select these dead instructions (if they don't have other side
      effects, and don't define physical registers).
      
      Preserve existing tests by adding COPYs.
      
      In some tests, the G_CONSTANT vregs never get constrained to a class:
      the only use of the vreg was folded into another instruction, so the
      G_CONSTANT, now dead, never gets selected.
      
      llvm-svn: 298224
      931904d7
  6. Mar 18, 2017
  7. Mar 17, 2017
  8. Mar 16, 2017
  9. Mar 15, 2017
  10. Mar 14, 2017
    • Nirav Dave's avatar
      In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. · 54e22f33
      Nirav Dave authored
          Recommiting with compiler time improvements
      
          Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
      
          * Simplify Consecutive Merge Store Candidate Search
      
          Now that address aliasing is much less conservative, push through
          simplified store merging search and chain alias analysis which only
          checks for parallel stores through the chain subgraph. This is cleaner
          as the separation of non-interfering loads/stores from the
          store-merging logic.
      
          When merging stores search up the chain through a single load, and
          finds all possible stores by looking down from through a load and a
          TokenFactor to all stores visited.
      
          This improves the quality of the output SelectionDAG and the output
          Codegen (save perhaps for some ARM cases where we correctly constructs
          wider loads, but then promotes them to float operations which appear
          but requires more expensive constant generation).
      
          Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
      
          Additional Minor Changes:
      
            1. Finishes removing unused AliasLoad code
      
            2. Unifies the chain aggregation in the merged stores across code
               paths
      
            3. Re-add the Store node to the worklist after calling
               SimplifyDemandedBits.
      
            4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
               arbitrary, but seems sufficient to not cause regressions in
               tests.
      
            5. Remove Chain dependencies of Memory operations on CopyfromReg
               nodes as these are captured by data dependence
      
            6. Forward loads-store values through tokenfactors containing
                {CopyToReg,CopyFromReg} Values.
      
            7. Peephole to convert buildvector of extract_vector_elt to
               extract_subvector if possible (see
               CodeGen/AArch64/store-merge.ll)
      
            8. Store merging for the ARM target is restricted to 32-bit as
               some in some contexts invalid 64-bit operations are being
               generated. This can be removed once appropriate checks are
               added.
      
          This finishes the change Matt Arsenault started in r246307 and
          jyknight's original patch.
      
          Many tests required some changes as memory operations are now
          reorderable, improving load-store forwarding. One test in
          particular is worth noting:
      
            CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
            forwarding converts a load-store pair into a parallel store and
            a memory-realized bitcast of the same value. However, because we
            lose the sharing of the explicit and implicit store values we
            must create another local store. A similar transformation
            happens before SelectionDAG as well.
      
          Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
      
      llvm-svn: 297695
      54e22f33
  11. Mar 13, 2017
  12. Mar 11, 2017
    • Matt Arsenault's avatar
      AMDGPU: Remove packf16 intrinsic · dd905b0e
      Matt Arsenault authored
      llvm-svn: 297557
      dd905b0e
    • Matt Arsenault's avatar
      AMDGPU: Keep track of modifiers when converting v_mac to v_mad · 3cb9ff88
      Matt Arsenault authored
      Since v_max_f32_e64/v_max_f16_e64 can be folded if the target
      instruction supports the clamp bit, we also need to maintain
      modifiers when converting v_mac to v_mad.
      
      This fixes a rendering issue with Dirt Rally because a v_mac
      instruction with the clamp bit set was converted to a v_mad
      but that bit was lost during the conversion.
      
      Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit")
      
      Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>
      
      llvm-svn: 297556
      3cb9ff88
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] Remove getBidirectionalReasonRank · 79da2a76
      Stanislav Mekhanoshin authored
      This method inverts the Reason field of a scheduling candidate.
      It does right comparison between RegCritical and RegExcess, but
      everything else is broken. In fact it can prefer less strong reason
      such as Weak over RegCritical because Weak > -RegCritical.
      
      The CandReason enum is properly sorted, so just remove artificial
      ranking.
      
      Differential Revision: https://reviews.llvm.org/D30557
      
      llvm-svn: 297536
      79da2a76
  13. Mar 09, 2017
  14. Mar 08, 2017
  15. Mar 07, 2017
  16. Mar 06, 2017
  17. Mar 03, 2017
    • Chandler Carruth's avatar
      [SDAG] Revert r296476 (and r296486, r296668, r296690). · ce52b807
      Chandler Carruth authored
      This patch causes compile times for some patterns to explode. I have
      a (large, unreduced) test case that slows down by more than 20x and
      several test cases slow down by 2x. I'm sending some of the test cases
      directly to Nirav and following up with more details in the review log,
      but this should unblock anyone else hitting this.
      
      llvm-svn: 296862
      ce52b807
  18. Mar 02, 2017
  19. Mar 01, 2017
    • Artur Pilipenko's avatar
      [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine · e1b2d314
      Artur Pilipenko authored
      Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336).
      
      Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.
      
      Reviewed By: filcab
      
      Differential Revision: https://reviews.llvm.org/D29591
      
      llvm-svn: 296651
      e1b2d314
    • Matt Arsenault's avatar
      AMDGPU: Re-do update for branch-relaxation test · 103af900
      Matt Arsenault authored
      Modify the test so that it is still testing something
      closer to what it was intended to originally.
      
      I think the original intent was to test the situation where
      there was a branch on execz and then unconditional branch
      required relaxing.With the change in r296539,
      there was no longer and execz branch.
      
      Change the test so that there is now an execz branch inserted.
      There is no longer an unconditional branch after the execz branch,
      so this might need to be tricked in some other way to keep that
      there.
      
      llvm-svn: 296574
      103af900
Loading