Skip to content
  1. Mar 30, 2017
  2. Mar 28, 2017
  3. Mar 27, 2017
    • Yaxun Liu's avatar
      [AMDGPU] Get address space mapping by target triple environment · 1a14bfa0
      Yaxun Liu authored
      As we introduced target triple environment amdgiz and amdgizcl, the address
      space values are no longer enums. We have to decide the value by target triple.
      
      The basic idea is to use struct AMDGPUAS to represent address space values.
      For address space values which are not depend on target triple, use static
      const members, so that they don't occupy extra memory space and is equivalent
      to a compile time constant.
      
      Since the struct is lightweight and cheap, it can be created on the fly at
      the point of usage. Or it can be added as member to a pass and created at
      the beginning of the run* function.
      
      Differential Revision: https://reviews.llvm.org/D31284
      
      llvm-svn: 298846
      1a14bfa0
  4. Mar 25, 2017
  5. Mar 24, 2017
  6. Mar 21, 2017
    • Valery Pykhtin's avatar
      fd4c410f
    • Sam Kolton's avatar
      [ADMGPU] SDWA peephole optimization pass. · f60ad58d
      Sam Kolton authored
      Summary:
      First iteration of SDWA peephole.
      
      This pass tries to combine several instruction into one SDWA instruction. E.g. it converts:
      '''
          V_LSHRREV_B32_e32 %vreg0, 16, %vreg1
          V_ADD_I32_e32 %vreg2, %vreg0, %vreg3
          V_LSHLREV_B32_e32 %vreg4, 16, %vreg2
      '''
      Into:
      '''
         V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
      '''
      
      Pass structure:
          1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''.
          2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0'''
          3. Iterate over all potential instructions and check if they can be converted into SDWA.
          4. Convert instructions to SDWA.
      
      This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done).
      There are several ways this pass can be improved:
          1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass.
          2. Introduce more SDWA patterns
          3. Introduce mnemonics to limit when SDWA patterns should apply
      
      Reviewers: vpykhtin, alex-t, arsenm, rampitec
      
      Subscribers: wdng, nhaehnle, mgorny
      
      Differential Revision: https://reviews.llvm.org/D30038
      
      llvm-svn: 298365
      f60ad58d
  7. Mar 20, 2017
  8. Mar 18, 2017
  9. Mar 17, 2017
    • Stanislav Mekhanoshin's avatar
      Only unswitch loops with uniform conditions · ee2dd785
      Stanislav Mekhanoshin authored
      Loop unswitching can be extremely harmful for a SIMT target. In case
      if hoisted condition is not uniform a SIMT machine will execute both
      clones of a loop sequentially. Therefor LoopUnswitch checks if the
      condition is non-divergent.
      
      Since DivergenceAnalysis adds an expensive PostDominatorTree analysis
      not needed for non-SIMT targets a new option is added to avoid unneded
      analysis initialization. The method getAnalysisUsage is called when
      TargetTransformInfo is not yet available and we cannot use it here.
      For that reason a new field DivergentTarget is added to PassManagerBuilder
      to control the behavior and set this field from a target.
      
      Differential Revision: https://reviews.llvm.org/D30796
      
      llvm-svn: 298104
      ee2dd785
  10. Mar 16, 2017
  11. Feb 18, 2017
  12. Feb 15, 2017
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] Revert failed scheduling · 582a5237
      Stanislav Mekhanoshin authored
      This patch reverts region's scheduling to the original untouched state
      in case if we have have decreased occupancy.
      
      In addition it switches to use TargetRegisterInfo occupancy callback
      for pressure limits instead of gradually increasing limits which were
      just passed by. We are going to stay with the best schedule so we do
      not need to tolerate worsened scheduling anymore.
      
      Differential Revision: https://reviews.llvm.org/D29971
      
      llvm-svn: 295206
      582a5237
  13. Feb 09, 2017
  14. Feb 08, 2017
  15. Jan 30, 2017
  16. Jan 27, 2017
  17. Jan 26, 2017
  18. Jan 25, 2017
    • Matt Arsenault's avatar
      AMDGPU: Implement early ifcvt target hooks. · 9f5e0ef0
      Matt Arsenault authored
      Leave early ifcvt disabled for now since there are some
      shader-db regressions.
      
      This causes some immediate improvements, but could be better.
      The cost checking that the pass does is based on critical path
      length for out of order CPUs which we do not want so it skips out
      on many cases we want.
      
      llvm-svn: 293016
      9f5e0ef0
  19. Jan 24, 2017
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] Add VGPR copies post regalloc fix pass · 22a56f2f
      Stanislav Mekhanoshin authored
      Regalloc creates COPY instructions which do not formally use VALU.
      That results in v_mov instructions displaced after exec mask modification.
      One pass which do it is SIOptimizeExecMasking, but potentially it can be
      done by other passes too.
      
      This patch adds a pass immediately after regalloc to add implicit exec
      use operand to all VGPR copy instructions.
      
      Differential Revision: https://reviews.llvm.org/D28874
      
      llvm-svn: 292956
      22a56f2f
  20. Dec 12, 2016
  21. Dec 08, 2016
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] Add amdgpu-unify-metadata pass · 50ea93a2
      Stanislav Mekhanoshin authored
      Multiple metadata values for records such as opencl.ocl.version, llvm.ident
      and similar are created after linking several modules. For some of them, notably
      opencl.ocl.version, this creates semantic problem because we cannot tell which
      version of OpenCL the composite module conforms.
      
      Moreover, such repetitions of identical values often create a huge list of
      unneeded metadata, which grows bitcode size both in memory and stored on disk.
      It can go up to several Mb when linked against our OpenCL library. Lastly, such
      long lists obscure reading of dumped IR.
      
      The pass unifies metadata after linking.
      
      Differential Revision: https://reviews.llvm.org/D25381
      
      llvm-svn: 289092
      50ea93a2
    • Alexander Timofeev's avatar
      [AMDGPU] Scalarization of global uniform loads. · 18009560
      Alexander Timofeev authored
      Summary:
      LC can currently select scalar load for uniform memory access
      basing on readonly memory address space only. This restriction
      originated from the fact that in HW prior to VI vector and scalar caches
      are not coherent. With MemoryDependenceAnalysis we can check that the
      memory location corresponding to the memory operand of the LOAD is not
      clobbered along the all paths from the function entry.
      
      Reviewers: rampitec, tstellarAMD, arsenm
      
      Subscribers: wdng, arsenm, nhaehnle
      
      Differential Revision: https://reviews.llvm.org/D26917
      
      llvm-svn: 289076
      18009560
  22. Dec 06, 2016
    • Matt Arsenault's avatar
      AMDGPU: Don't required structured CFG · ad55ee58
      Matt Arsenault authored
      The structured CFG is just an aid to inserting exec
      mask modification instructions, once that is done
      we don't really need it anymore. We also
      do not analyze blocks with terminators that
      modify exec, so this should only be impacting
      true branches.
      
      llvm-svn: 288744
      ad55ee58
  23. Nov 28, 2016
    • Matthias Braun's avatar
      MachineScheduler: Export function to construct "default" scheduler. · 115efcd3
      Matthias Braun authored
      This makes the createGenericSchedLive() function that constructs the
      default scheduler available for the public API. This should help when
      you want to get a scheduler and the default list of DAG mutations.
      
      This also shrinks the list of default DAG mutations:
      {Load|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer
      added by default. Targets can easily add them if they need them. It also
      makes it easier for targets to add alternative/custom macrofusion or
      clustering mutations while staying with the default
      createGenericSchedLive(). It also saves the callback back and forth in
      TargetInstrInfo::enableClusterLoads()/enableClusterStores().
      
      Differential Revision: https://reviews.llvm.org/D26986
      
      llvm-svn: 288057
      115efcd3
  24. Nov 17, 2016
  25. Nov 16, 2016
  26. Nov 15, 2016
  27. Oct 10, 2016
  28. Oct 06, 2016
  29. Oct 03, 2016
  30. Sep 30, 2016
  31. Sep 29, 2016
    • Matt Arsenault's avatar
      AMDGPU: Partially fix control flow at -O0 · e6740754
      Matt Arsenault authored
      Fixes to allow spilling all registers at the end of the block
      work with exec modifications. Don't emit s_and_saveexec_b64 for
      if lowering, and instead emit copies. Mark control flow mask
      instructions as terminators to get correct spill code placement
      with fast regalloc, and then have a separate optimization pass
      form the saveexec.
      
      This should work if SGPRs are spilled to VGPRs, but
      will likely fail in the case that an SGPR spills to memory
      and no workitem takes a divergent branch.
      
      llvm-svn: 282667
      e6740754
  32. Sep 10, 2016
Loading