Skip to content
  1. Apr 18, 2017
  2. Apr 12, 2017
  3. Apr 10, 2017
    • Matt Arsenault's avatar
      AMDGPU: Fix crash when disassembling VOP3 mac · 678e111e
      Matt Arsenault authored
      The unused dummy src2_modifiers is missing, so it crashes
      when trying to print it.
      
      I tried to fully remove src2_modifiers, but there are some
      irritations in the places where it is converted to mad since
      it starts to require modifying use lists while iterating over
      them.
      
      llvm-svn: 299861
      678e111e
  4. Feb 21, 2017
    • Matt Arsenault's avatar
      AMDGPU: Don't use stack space for SGPR->VGPR spills · e0bf7d02
      Matt Arsenault authored
      Before frame offsets are calculated, try to eliminate the
      frame indexes used by SGPR spills. Then we can delete them
      after.
      
      I think for now we can be sure that no other instruction
      will be re-using the same frame indexes. It should be easy
      to notice if this assumption ever breaks since everything
      asserts if it tries to use a dead frame index later.
      
      The unused emergency stack slot seems to still be left behind,
      so an additional 4 bytes is still wasted.
      
      llvm-svn: 295753
      e0bf7d02
  5. Jan 25, 2017
  6. Jan 21, 2017
  7. Dec 20, 2016
  8. Sep 06, 2016
    • Konstantin Zhuravlyov's avatar
      [AMDGPU] Wave and register controls · 1d65026c
      Konstantin Zhuravlyov authored
      - Implemented amdgpu-flat-work-group-size attribute
      - Implemented amdgpu-num-active-waves-per-eu attribute
      - Implemented amdgpu-num-sgpr attribute
      - Implemented amdgpu-num-vgpr attribute
      - Dynamic LDS constraints are in a separate patch
      
      Patch by Tom Stellard and Konstantin Zhuravlyov
      
      Differential Revision: https://reviews.llvm.org/D21562
      
      llvm-svn: 280747
      1d65026c
  9. Aug 29, 2016
  10. Aug 11, 2016
  11. Jul 26, 2016
  12. Jul 22, 2016
  13. Jul 13, 2016
    • Marek Olsak's avatar
      AMDGPU/SI: Emit the number of SGPR and VGPR spills · 0532c190
      Marek Olsak authored
      Summary:
      v2: don't count SGPRs spilled to scratch twice
      
      I think this is sufficient. It doesn't count private memory usage, which
      happens often and uses scratch but isn't technically a spill. The private
      memory usage can be computed by:
        [scratch_per_thread - vgpr_spills - a random multiple of SGPR spills].
      
      The fact SGPR spills add very high numbers to the scratch size make that
      computation a guessing game, but I don't have a solution to that.
      
      Reviewers: tstellarAMD
      
      Subscribers: arsenm, kzhuravl
      
      Differential Revision: http://reviews.llvm.org/D22197
      
      llvm-svn: 275288
      0532c190
  14. Jun 27, 2016
  15. Jun 25, 2016
    • Konstantin Zhuravlyov's avatar
      [AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in the kernel code header · f2f3d147
      Konstantin Zhuravlyov authored
      Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.
      
      Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
        - offset 0: work group ID x
        - offset 4: work group ID y
        - offset 8: work group ID z
        - offset 16: work item ID x
        - offset 20: work item ID y
        - offset 24: work item ID z
      
      Set
        - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
        - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
        - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled
      
      Differential Revision: http://reviews.llvm.org/D20335
      
      llvm-svn: 273769
      f2f3d147
  16. May 24, 2016
  17. Apr 26, 2016
  18. Apr 25, 2016
  19. Apr 14, 2016
    • Tom Stellard's avatar
      AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit · 79a1fd71
      Tom Stellard authored
      Summary:
      For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD.
      
      This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions.
      
      Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug.
      
      Reviewers: mareko, arsenm, tstellarAMD, nhaehnle
      
      Subscribers: FireBurn, kerberizer, llvm-commits, arsenm
      
      Differential Revision: http://reviews.llvm.org/D18340
      
      Patch By: Bas Nieuwenhuizen
      
      llvm-svn: 266337
      79a1fd71
    • Tom Stellard's avatar
      AMDGPU/SI: Use the correct scratch wave offset register for shaders. · f110f8f9
      Tom Stellard authored
      
      
      Summary:
      The code previously always used s1 as it was using the user + system SGPR
      information for compute kernels. This is incorrect for Mesa shaders though,
      
      The register should be the next SGPR after all user and system SGPR's.
      We use that Mesa adds arguments for all input and system SGPR's and
      take the next available SGPR for the scratch wave offset register.
      
      Signed-off-by: default avatarBas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
      
      Reviewers: mareko, arsenm, nhaehnle, tstellarAMD
      
      Subscribers: qcolombet, arsenm, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D18941
      
      Patch By: Bas Nieuwenhuizen
      
      llvm-svn: 266336
      f110f8f9
  20. Mar 11, 2016
  21. Mar 04, 2016
  22. Feb 12, 2016
    • Matt Arsenault's avatar
      AMDGPU: Set flat_scratch from flat_scratch_init reg · 296b8491
      Matt Arsenault authored
      This was hardcoded to the static private size, but this
      would be missing the offset and additional size for someday
      when we have dynamic sizing.
      
      Also stops always initializing flat_scratch even when unused.
      
      In the future we should stop emitting this unless flat instructions
      are used to access private memory. For example this will initialize
      it almost always on VI because flat is used for global access.
      
      llvm-svn: 260658
      296b8491
  23. Jan 13, 2016
    • Marek Olsak's avatar
      AMDGPU/SI: Add s_waitcnt at the end of non-void functions · 8e9cc63b
      Marek Olsak authored
      Summary:
      v2: Make ReturnsVoid private, so that I can another 8 lines of code and
          look more productive.
      
      Reviewers: tstellarAMD, arsenm
      
      Subscribers: arsenm
      
      Differential Revision: http://reviews.llvm.org/D16034
      
      llvm-svn: 257622
      8e9cc63b
    • Marek Olsak's avatar
      AMDGPU/SI: Add new target attribute InitialPSInputAddr · fccabaf5
      Marek Olsak authored
      Summary:
      This allows Mesa to pass initial SPI_PS_INPUT_ADDR to LLVM.
      The register assigns VGPR locations to PS inputs, while the ENA register
      determines whether or not they are loaded.
      
      Mesa needs to set some inputs as not-movable, so that a pixel shader prolog
      binary appended at the beginning can assume where some inputs are.
      
      v2: Make PSInputAddr private, because there is never enough silly getters
          and setters for people to read.
      
      Reviewers: tstellarAMD, arsenm
      
      Subscribers: arsenm
      
      Differential Revision: http://reviews.llvm.org/D16030
      
      llvm-svn: 257591
      fccabaf5
  24. Nov 30, 2015
    • Matt Arsenault's avatar
      AMDGPU: Rework how private buffer passed for HSA · 26f8f3db
      Matt Arsenault authored
      If we know we have stack objects, we reserve the registers
      that the private buffer resource and wave offset are passed
      and use them directly.
      
      If not, reserve the last 5 SGPRs just in case we need to spill.
      After register allocation, try to pick the next available registers
      instead of the last SGPRs, and then insert copies from the inputs
      to the reserved registers in the progloue.
      
      This also only selectively enables all of the input registers
      which are really required instead of always enabling them.
      
      llvm-svn: 254331
      26f8f3db
  25. Nov 25, 2015
  26. Nov 05, 2015
  27. Jun 13, 2015
  28. Jan 20, 2015
  29. Jan 14, 2015
  30. Sep 24, 2014
    • Tom Stellard's avatar
      R600/SI: Implement VGPR register spilling for compute at -O0 v3 · 96468903
      Tom Stellard authored
      VGPRs are spilled to LDS.  This still needs more testing, but
      we need to at least enable it at -O0, because the fast register
      allocator spills all registers that are live at the end of blocks
      and without this some future commits will break the
      flat-address-space.ll test.
      
      v2: Only calculate thread id once
      
      v3: Move insertion of spill instructions to
          SIRegisterInfo::eliminateFrameIndex()
      llvm-svn: 218348
      96468903
  31. Aug 21, 2014
    • Tom Stellard's avatar
      R600/SI: Remove unused SGPR spilling code · 8e52375b
      Tom Stellard authored
      llvm-svn: 216218
      8e52375b
    • Tom Stellard's avatar
      R600/SI: Use eliminateFrameIndex() to expand SGPR spill pseudos · c5cf2f04
      Tom Stellard authored
      This will simplify the SGPR spilling and also allow us to use
      MachineFrameInfo for calculating offsets, which should be more
      reliable than our custom code.
      
      This fixes a crash in some cases where a register would be spilled
      in a branch such that the VGPR defined for spilling did not dominate
      all the uses when restoring.
      
      This fixes a crash in an ocl conformance test.  The test requries
      register spilling and is too big to include.
      
      llvm-svn: 216217
      c5cf2f04
  32. Aug 13, 2014
    • Benjamin Kramer's avatar
      Canonicalize header guards into a common format. · a7c40ef0
      Benjamin Kramer authored
      Add header guards to files that were missing guards. Remove #endif comments
      as they don't seem common in LLVM (we can easily add them back if we decide
      they're useful)
      
      Changes made by clang-tidy with minor tweaks.
      
      llvm-svn: 215558
      a7c40ef0
  33. Jul 21, 2014
  34. May 02, 2014
Loading