Skip to content
  1. Feb 15, 2017
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] Revert failed scheduling · 582a5237
      Stanislav Mekhanoshin authored
      This patch reverts region's scheduling to the original untouched state
      in case if we have have decreased occupancy.
      
      In addition it switches to use TargetRegisterInfo occupancy callback
      for pressure limits instead of gradually increasing limits which were
      just passed by. We are going to stay with the best schedule so we do
      not need to tolerate worsened scheduling anymore.
      
      Differential Revision: https://reviews.llvm.org/D29971
      
      llvm-svn: 295206
      582a5237
  2. Feb 09, 2017
  3. Feb 08, 2017
  4. Jan 30, 2017
  5. Jan 27, 2017
  6. Jan 26, 2017
  7. Jan 25, 2017
    • Matt Arsenault's avatar
      AMDGPU: Implement early ifcvt target hooks. · 9f5e0ef0
      Matt Arsenault authored
      Leave early ifcvt disabled for now since there are some
      shader-db regressions.
      
      This causes some immediate improvements, but could be better.
      The cost checking that the pass does is based on critical path
      length for out of order CPUs which we do not want so it skips out
      on many cases we want.
      
      llvm-svn: 293016
      9f5e0ef0
  8. Jan 24, 2017
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] Add VGPR copies post regalloc fix pass · 22a56f2f
      Stanislav Mekhanoshin authored
      Regalloc creates COPY instructions which do not formally use VALU.
      That results in v_mov instructions displaced after exec mask modification.
      One pass which do it is SIOptimizeExecMasking, but potentially it can be
      done by other passes too.
      
      This patch adds a pass immediately after regalloc to add implicit exec
      use operand to all VGPR copy instructions.
      
      Differential Revision: https://reviews.llvm.org/D28874
      
      llvm-svn: 292956
      22a56f2f
  9. Dec 12, 2016
  10. Dec 08, 2016
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] Add amdgpu-unify-metadata pass · 50ea93a2
      Stanislav Mekhanoshin authored
      Multiple metadata values for records such as opencl.ocl.version, llvm.ident
      and similar are created after linking several modules. For some of them, notably
      opencl.ocl.version, this creates semantic problem because we cannot tell which
      version of OpenCL the composite module conforms.
      
      Moreover, such repetitions of identical values often create a huge list of
      unneeded metadata, which grows bitcode size both in memory and stored on disk.
      It can go up to several Mb when linked against our OpenCL library. Lastly, such
      long lists obscure reading of dumped IR.
      
      The pass unifies metadata after linking.
      
      Differential Revision: https://reviews.llvm.org/D25381
      
      llvm-svn: 289092
      50ea93a2
    • Alexander Timofeev's avatar
      [AMDGPU] Scalarization of global uniform loads. · 18009560
      Alexander Timofeev authored
      Summary:
      LC can currently select scalar load for uniform memory access
      basing on readonly memory address space only. This restriction
      originated from the fact that in HW prior to VI vector and scalar caches
      are not coherent. With MemoryDependenceAnalysis we can check that the
      memory location corresponding to the memory operand of the LOAD is not
      clobbered along the all paths from the function entry.
      
      Reviewers: rampitec, tstellarAMD, arsenm
      
      Subscribers: wdng, arsenm, nhaehnle
      
      Differential Revision: https://reviews.llvm.org/D26917
      
      llvm-svn: 289076
      18009560
  11. Dec 06, 2016
    • Matt Arsenault's avatar
      AMDGPU: Don't required structured CFG · ad55ee58
      Matt Arsenault authored
      The structured CFG is just an aid to inserting exec
      mask modification instructions, once that is done
      we don't really need it anymore. We also
      do not analyze blocks with terminators that
      modify exec, so this should only be impacting
      true branches.
      
      llvm-svn: 288744
      ad55ee58
  12. Nov 28, 2016
    • Matthias Braun's avatar
      MachineScheduler: Export function to construct "default" scheduler. · 115efcd3
      Matthias Braun authored
      This makes the createGenericSchedLive() function that constructs the
      default scheduler available for the public API. This should help when
      you want to get a scheduler and the default list of DAG mutations.
      
      This also shrinks the list of default DAG mutations:
      {Load|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer
      added by default. Targets can easily add them if they need them. It also
      makes it easier for targets to add alternative/custom macrofusion or
      clustering mutations while staying with the default
      createGenericSchedLive(). It also saves the callback back and forth in
      TargetInstrInfo::enableClusterLoads()/enableClusterStores().
      
      Differential Revision: https://reviews.llvm.org/D26986
      
      llvm-svn: 288057
      115efcd3
  13. Nov 17, 2016
  14. Nov 16, 2016
  15. Nov 15, 2016
  16. Oct 10, 2016
  17. Oct 06, 2016
  18. Oct 03, 2016
  19. Sep 30, 2016
  20. Sep 29, 2016
    • Matt Arsenault's avatar
      AMDGPU: Partially fix control flow at -O0 · e6740754
      Matt Arsenault authored
      Fixes to allow spilling all registers at the end of the block
      work with exec modifications. Don't emit s_and_saveexec_b64 for
      if lowering, and instead emit copies. Mark control flow mask
      instructions as terminators to get correct spill code placement
      with fast regalloc, and then have a separate optimization pass
      form the saveexec.
      
      This should work if SGPRs are spilled to VGPRs, but
      will likely fail in the case that an SGPR spills to memory
      and no workitem takes a divergent branch.
      
      llvm-svn: 282667
      e6740754
  21. Sep 10, 2016
  22. Aug 29, 2016
    • Tom Stellard's avatar
      AMDGPU/SI: Implement a custom MachineSchedStrategy · 0d23ebe8
      Tom Stellard authored
      Summary:
      GCNSchedStrategy re-uses most of GenericScheduler, it's just uses
      a different method to compute the excess and critical register
      pressure limits.
      
      It's not enabled by default, to enable it you need to pass -misched=gcn
      to llc.
      
      Shader DB stats:
      
      32464 shaders in 17874 tests
      Totals:
      SGPRS: 1542846 -> 1643125 (6.50 %)
      VGPRS: 1005595 -> 904653 (-10.04 %)
      Spilled SGPRs: 29929 -> 27745 (-7.30 %)
      Spilled VGPRs: 334 -> 352 (5.39 %)
      Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
      Code Size: 36688188 -> 37034900 (0.95 %) bytes
      LDS: 1913 -> 1913 (0.00 %) blocks
      Max Waves: 254101 -> 265125 (4.34 %)
      Wait states: 0 -> 0 (0.00 %)
      
      Totals from affected shaders:
      SGPRS: 1338220 -> 1438499 (7.49 %)
      VGPRS: 886221 -> 785279 (-11.39 %)
      Spilled SGPRs: 29869 -> 27685 (-7.31 %)
      Spilled VGPRs: 334 -> 352 (5.39 %)
      Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
      Code Size: 34315716 -> 34662428 (1.01 %) bytes
      LDS: 1551 -> 1551 (0.00 %) blocks
      Max Waves: 188127 -> 199151 (5.86 %)
      Wait states: 0 -> 0 (0.00 %)
      
      Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick
      
      Subscribers: arsenm, kzhuravl, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D23688
      
      llvm-svn: 279995
      0d23ebe8
    • Tom Stellard's avatar
      AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler · c2ff0eb6
      Tom Stellard authored
      Summary:
      The SILoadStoreOptimizer can now look ahead more then one instruction when
      looking for instructions to merge, which greatly improves the number of
      loads/stores that we are able to merge.
      
      Moving the pass before scheduling avoids increasing register pressure after
      the scheduler, so that the scheduler's register pressure estimates will be
      more accurate.  It also gives more consistent results, since it is no longer
      affected by minor scheduling changes.
      
      Reviewers: arsenm
      
      Subscribers: arsenm, kzhuravl, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D23814
      
      llvm-svn: 279991
      c2ff0eb6
  23. Aug 22, 2016
    • Matt Arsenault's avatar
      AMDGPU: Split SILowerControlFlow into two pieces · 78fc9daf
      Matt Arsenault authored
      Do most of the lowering in a pre-RA pass. Keep the skip jump
      insertion late, plus a few other things that require more
      work to move out.
      
      One concern I have is now there may be COPY instructions
      which do not have the necessary implicit exec uses
      if they will be lowered to v_mov_b32.
      
      This has a positive effect on SGPR usage in shader-db.
      
      llvm-svn: 279464
      78fc9daf
  24. Aug 17, 2016
    • Chandler Carruth's avatar
      [PM] Port the always inliner to the new pass manager in a much more · 67fc52f0
      Chandler Carruth authored
      minimal and boring form than the old pass manager's version.
      
      This pass does the very minimal amount of work necessary to inline
      functions declared as always-inline. It doesn't support a wide array of
      things that the legacy pass manager did support, but is alse ... about
      20 lines of code. So it has that going for it. Notably things this
      doesn't support:
      
      - Array alloca merging
        - To support the above, bottom-up inlining with careful history
          tracking and call graph updates
      - DCE of the functions that become dead after this inlining.
      - Inlining through call instructions with the always_inline attribute.
        Instead, it focuses on inlining functions with that attribute.
      
      The first I've omitted because I'm hoping to just turn it off for the
      primary pass manager. If that doesn't pan out, I can add it here but it
      will be reasonably expensive to do so.
      
      The second should really be handled by running global-dce after the
      inliner. I don't want to re-implement the non-trivial logic necessary to
      do comdat-correct DCE of functions. This means the -O0 pipeline will
      have to be at least 'always-inline,global-dce', but that seems
      reasonable to me. If others are seriously worried about this I'd like to
      hear about it and understand why. Again, this is all solveable by
      factoring that logic into a utility and calling it here, but I'd like to
      wait to do that until there is a clear reason why the existing
      pass-based factoring won't work.
      
      The final point is a serious one. I can fairly easily add support for
      this, but it seems both costly and a confusing construct for the use
      case of the always inliner running at -O0. This attribute can of course
      still impact the normal inliner easily (although I find that
      a questionable re-use of the same attribute). I've started a discussion
      to sort out what semantics we want here and based on that can figure out
      if it makes sense ta have this complexity at O0 or not.
      
      One other advantage of this design is that it should be quite a bit
      faster due to checking for whether the function is a viable candidate
      for inlining exactly once per function instead of doing it for each call
      site.
      
      Anyways, hopefully a reasonable starting point for this pass.
      
      Differential Revision: https://reviews.llvm.org/D23299
      
      llvm-svn: 278896
      67fc52f0
    • Konstantin Zhuravlyov's avatar
      e0b87181
  25. Aug 11, 2016
  26. Jul 27, 2016
  27. Jul 22, 2016
  28. Jul 20, 2016
  29. Jul 14, 2016
  30. Jul 13, 2016
  31. Jul 01, 2016
  32. Jun 28, 2016
Loading