Skip to content
  1. Mar 30, 2017
  2. Mar 27, 2017
    • Yaxun Liu's avatar
      [AMDGPU] Get address space mapping by target triple environment · 1a14bfa0
      Yaxun Liu authored
      As we introduced target triple environment amdgiz and amdgizcl, the address
      space values are no longer enums. We have to decide the value by target triple.
      
      The basic idea is to use struct AMDGPUAS to represent address space values.
      For address space values which are not depend on target triple, use static
      const members, so that they don't occupy extra memory space and is equivalent
      to a compile time constant.
      
      Since the struct is lightweight and cheap, it can be created on the fly at
      the point of usage. Or it can be added as member to a pass and created at
      the beginning of the run* function.
      
      Differential Revision: https://reviews.llvm.org/D31284
      
      llvm-svn: 298846
      1a14bfa0
  3. Mar 24, 2017
    • Matt Arsenault's avatar
      AMDGPU: Unify divergent function exits. · b8f8dbc2
      Matt Arsenault authored
      StructurizeCFG can't handle cases with multiple
      returns creating regions with multiple exits.
      Create a copy of UnifyFunctionExitNodes that only
      unifies exit nodes that skips exit nodes
      with uniform branch sources.
      
      llvm-svn: 298729
      b8f8dbc2
  4. Mar 21, 2017
    • Sam Kolton's avatar
      [ADMGPU] SDWA peephole optimization pass. · f60ad58d
      Sam Kolton authored
      Summary:
      First iteration of SDWA peephole.
      
      This pass tries to combine several instruction into one SDWA instruction. E.g. it converts:
      '''
          V_LSHRREV_B32_e32 %vreg0, 16, %vreg1
          V_ADD_I32_e32 %vreg2, %vreg0, %vreg3
          V_LSHLREV_B32_e32 %vreg4, 16, %vreg2
      '''
      Into:
      '''
         V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
      '''
      
      Pass structure:
          1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''.
          2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0'''
          3. Iterate over all potential instructions and check if they can be converted into SDWA.
          4. Convert instructions to SDWA.
      
      This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done).
      There are several ways this pass can be improved:
          1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass.
          2. Introduce more SDWA patterns
          3. Introduce mnemonics to limit when SDWA patterns should apply
      
      Reviewers: vpykhtin, alex-t, arsenm, rampitec
      
      Subscribers: wdng, nhaehnle, mgorny
      
      Differential Revision: https://reviews.llvm.org/D30038
      
      llvm-svn: 298365
      f60ad58d
  5. Mar 18, 2017
  6. Feb 18, 2017
  7. Feb 09, 2017
  8. Jan 27, 2017
  9. Jan 24, 2017
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] Add VGPR copies post regalloc fix pass · 22a56f2f
      Stanislav Mekhanoshin authored
      Regalloc creates COPY instructions which do not formally use VALU.
      That results in v_mov instructions displaced after exec mask modification.
      One pass which do it is SIOptimizeExecMasking, but potentially it can be
      done by other passes too.
      
      This patch adds a pass immediately after regalloc to add implicit exec
      use operand to all VGPR copy instructions.
      
      Differential Revision: https://reviews.llvm.org/D28874
      
      llvm-svn: 292956
      22a56f2f
  10. Dec 08, 2016
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] Add amdgpu-unify-metadata pass · 50ea93a2
      Stanislav Mekhanoshin authored
      Multiple metadata values for records such as opencl.ocl.version, llvm.ident
      and similar are created after linking several modules. For some of them, notably
      opencl.ocl.version, this creates semantic problem because we cannot tell which
      version of OpenCL the composite module conforms.
      
      Moreover, such repetitions of identical values often create a huge list of
      unneeded metadata, which grows bitcode size both in memory and stored on disk.
      It can go up to several Mb when linked against our OpenCL library. Lastly, such
      long lists obscure reading of dumped IR.
      
      The pass unifies metadata after linking.
      
      Differential Revision: https://reviews.llvm.org/D25381
      
      llvm-svn: 289092
      50ea93a2
  11. Oct 10, 2016
  12. Oct 03, 2016
  13. Sep 29, 2016
    • Matt Arsenault's avatar
      AMDGPU: Partially fix control flow at -O0 · e6740754
      Matt Arsenault authored
      Fixes to allow spilling all registers at the end of the block
      work with exec modifications. Don't emit s_and_saveexec_b64 for
      if lowering, and instead emit copies. Mark control flow mask
      instructions as terminators to get correct spill code placement
      with fast regalloc, and then have a separate optimization pass
      form the saveexec.
      
      This should work if SGPRs are spilled to VGPRs, but
      will likely fail in the case that an SGPR spills to memory
      and no workitem takes a divergent branch.
      
      llvm-svn: 282667
      e6740754
  14. Aug 22, 2016
    • Matt Arsenault's avatar
      AMDGPU: Split SILowerControlFlow into two pieces · 78fc9daf
      Matt Arsenault authored
      Do most of the lowering in a pre-RA pass. Keep the skip jump
      insertion late, plus a few other things that require more
      work to move out.
      
      One concern I have is now there may be COPY instructions
      which do not have the necessary implicit exec uses
      if they will be lowered to v_mov_b32.
      
      This has a positive effect on SGPR usage in shader-db.
      
      llvm-svn: 279464
      78fc9daf
  15. Aug 11, 2016
  16. Jul 20, 2016
  17. Jul 14, 2016
  18. Jun 24, 2016
  19. Jun 10, 2016
  20. May 31, 2016
  21. May 13, 2016
  22. May 10, 2016
  23. Apr 14, 2016
    • Nicolai Haehnle's avatar
      AMDGPU: Remove SIFixSGPRLiveRanges pass · 723b73b4
      Nicolai Haehnle authored
      Summary:
      This pass is unnecessary and overly conservative. It was motivated by
      situations like
      
        def %vreg0:SGPR_32
        ...
      if-block:
        ..
        def %vreg1:SGPR_32
        ...
      else-block:
        ...
        use %vreg0:SGPR_32
        ...
      
      and similar situations with uses after the non-uniform control flow, where
      we are not allowed to assign %vreg0 and %vreg1 to the same physical register,
      even though in the original, thread/workitem-based CFG, it looks like the
      live ranges of these registers do not overlap.
      
      However, by the time register allocation runs, we have moved to a wave-based
      CFG that accurately represents the fact that the wave may run through both
      the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already
      overlap even without the SIFixSGPRLiveRanges pass.
      
      In addition to proving this change correct, I have tested it with Piglit
      and a small number of other tests.
      
      Reviewers: arsenm, tstellarAMD
      
      Subscribers: MatzeB, arsenm, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D19041
      
      llvm-svn: 266345
      723b73b4
  24. Apr 06, 2016
  25. Mar 21, 2016
    • Nicolai Haehnle's avatar
      AMDGPU: Add SIWholeQuadMode pass · 213e87f2
      Nicolai Haehnle authored
      Summary:
      Whole quad mode is already enabled for pixel shaders that compute
      derivatives, but it must be suspended for instructions that cause a
      shader to have side effects (i.e. stores and atomics).
      
      This pass addresses the issue by storing the real (initial) live mask
      in a register, masking EXEC before instructions that require exact
      execution and (re-)enabling WQM where required.
      
      This pass is run before register coalescing so that we can use
      machine SSA for analysis.
      
      The changes in this patch expose a problem with the second machine
      scheduling pass: target independent instructions like COPY implicitly
      use EXEC when they operate on VGPRs, but this fact is not encoded in
      the MIR. This can lead to miscompilation because instructions are
      moved past changes to EXEC.
      
      This patch fixes the problem by adding use-implicit operands to
      target independent instructions. Some general codegen passes are
      relaxed to work with such implicit use operands.
      
      Reviewers: arsenm, tstellarAMD, mareko
      
      Subscribers: MatzeB, arsenm, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D18162
      
      llvm-svn: 263982
      213e87f2
  26. Mar 11, 2016
  27. Mar 03, 2016
    • Tom Stellard's avatar
      AMDGPU: Insert two S_NOP instructions for every high level source statement. · cc7067a6
      Tom Stellard authored
      Patch by: Konstantin Zhuravlyov
      
      Summary: Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input.
      
      Reviewers: tstellarAMD, arsenm
      
      Subscribers: echristo, dblaikie, arsenm, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D17454
      
      llvm-svn: 262579
      cc7067a6
  28. Feb 13, 2016
  29. Feb 12, 2016
  30. Feb 05, 2016
  31. Jan 30, 2016
    • Matt Arsenault's avatar
      AMDGPU: Fix emitting invalid workitem intrinsics for HSA · e0132464
      Matt Arsenault authored
      The AMDGPUPromoteAlloca pass was emitting the read.local.size
      calls, which with HSA was incorrectly selected to reading from
      the offset mesa uses off of the kernarg pointer.
      
      Error on intrinsics which aren't supported by HSA, and start
      emitting the correct IR to read the workgroup size
      out of the dispatch pointer.
      
      Also initialize the pass so it can be tested with opt, and
      start moving towards not depending on the subtarget as an
      argument.
      
      Start emitting errors for the intrinsics not handled with HSA.
      
      llvm-svn: 259297
      e0132464
  32. Jan 20, 2016
  33. Jan 13, 2016
  34. Dec 15, 2015
    • Tom Stellard's avatar
      AMDGPU/SI: Select constant loads with non-uniform addresses to MUBUF instructions · a6f24c65
      Tom Stellard authored
      Summary:
      We were previously selecting all constant loads to SMRD instructions and legalizing
      the SMRDs with non-uniform addresses during the SIFixSGPRCopesPass.
      
      This new solution is more simple and also generates much better code, because
      the instruction selector is able to take advantage of all the MUBUF addressing
      modes that are legalization pass wasn't able to.
      
      We also no longer need to generate v_add_* instructions when we
      have a uniform pointer and a non-uniform offset, as this is now folded into the
      MUBUF instruction during instruction selection.
      
      Reviewers: arsenm
      
      Subscribers: arsenm, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D15425
      
      llvm-svn: 255672
      a6f24c65
  35. Dec 10, 2015
  36. Nov 30, 2015
    • Matt Arsenault's avatar
      AMDGPU: Remove SIPrepareScratchRegs · 0e3d3893
      Matt Arsenault authored
      It does not work because of emergency stack slots.
      This pass was supposed to eliminate dummy registers for the
      spill instructions, but the register scavenger can introduce
      more during PrologEpilogInserter, so some would end up
      left behind if they were needed.
      
      The potential for spilling the scratch resource descriptor
      and offset register makes doing something like this
      overly complicated. Reserve registers to use for the resource
      descriptor and use them directly in eliminateFrameIndex.
      
      Also removes creating another scratch resource descriptor
      when directly selecting scratch MUBUF instructions.
      
      The choice of which registers are reserved is temporary.
      For now it attempts to pick the next available registers
      after the user and system SGPRs.
      
      llvm-svn: 254329
      0e3d3893
  37. Nov 06, 2015
    • Matt Arsenault's avatar
      AMDGPU: Add pass to detect used kernel features · 3931948b
      Matt Arsenault authored
      Mark kernels that use certain features that require user
      SGPRs to support with kernel attributes. We need to know
      before instruction selection begins because it impacts
      the kernel calling convention lowering.
      
      For now this only detects the workitem intrinsics.
      
      llvm-svn: 252323
      3931948b
  38. Nov 03, 2015
  39. Aug 08, 2015
Loading