Skip to content
  1. Jul 01, 2016
  2. Jun 28, 2016
  3. Jun 27, 2016
  4. Jun 24, 2016
  5. Jun 22, 2016
  6. Jun 15, 2016
  7. Jun 10, 2016
  8. Jun 02, 2016
    • Matt Arsenault's avatar
      AMDGPU: Fix crashes on unknown processor name · 8e00194b
      Matt Arsenault authored
      If the processor name failed to parse for amdgcn,
      the resulting output would have R600 ISA in it.
      
      If the processor name was missing or invalid for R600,
      the wavefront size would not be set and there would be
      crashes from missing itinerary data.
      
      Fixes crashes in future commit caused by dividing by the unset/0
      wavefront size.
      
      llvm-svn: 271561
      8e00194b
    • Matt Arsenault's avatar
      AMDGPU: SIDebuggerInsertNops preserves CFG · d3e4c646
      Matt Arsenault authored
      This saves an additional run of the DominatorTree and
      MachineLoopInfo
      
      llvm-svn: 271444
      d3e4c646
  9. May 31, 2016
  10. May 19, 2016
    • Rafael Espindola's avatar
      Delete Reloc::Default. · 8c34dd82
      Rafael Espindola authored
      Having an enum member named Default is quite confusing: Is it distinct
      from the others?
      
      This patch removes that member and instead uses Optional<Reloc> in
      places where we have a user input that still hasn't been maped to the
      default value, which is now clear has no be one of the remaining 3
      options.
      
      llvm-svn: 269988
      8c34dd82
  11. May 18, 2016
  12. May 10, 2016
  13. May 05, 2016
  14. Apr 30, 2016
  15. Apr 29, 2016
  16. Apr 22, 2016
  17. Apr 18, 2016
  18. Apr 14, 2016
    • Matt Arsenault's avatar
      AMDGPU: Run SIFoldOperands after PeepholeOptimizer · 3d1c1deb
      Matt Arsenault authored
      PeepholeOptimizer cleans up redundant copies, which makes
      the operand folding more effective.
      
      shader-db stats:
      
      Totals:
      SGPRS: 34200 -> 34336 (0.40 %)
      VGPRS: 22118 -> 21655 (-2.09 %)
      Code Size: 632144 -> 633460 (0.21 %) bytes
      LDS: 11 -> 11 (0.00 %) blocks
      Scratch: 10240 -> 11264 (10.00 %) bytes per wave
      Max Waves: 8822 -> 8918 (1.09 %)
      Wait states: 0 -> 0 (0.00 %)
      
      Totals from affected shaders:
      SGPRS: 7704 -> 7840 (1.77 %)
      VGPRS: 5169 -> 4706 (-8.96 %)
      Code Size: 234444 -> 235760 (0.56 %) bytes
      LDS: 2 -> 2 (0.00 %) blocks
      Scratch: 0 -> 1024 (0.00 %) bytes per wave
      Max Waves: 1188 -> 1284 (8.08 %)
      Wait states: 0 -> 0 (0.00 %)
      
      Increases:
      SGPRS: 35 (0.01 %)
      VGPRS: 1 (0.00 %)
      Code Size: 59 (0.02 %)
      LDS: 0 (0.00 %)
      Scratch: 1 (0.00 %)
      Max Waves: 48 (0.02 %)
      Wait states: 0 (0.00 %)
      
      Decreases:
      SGPRS: 26 (0.01 %)
      VGPRS: 54 (0.02 %)
      Code Size: 68 (0.03 %)
      LDS: 0 (0.00 %)
      Scratch: 0 (0.00 %)
      Max Waves: 4 (0.00 %)
      Wait states: 0 (0.00 %)
      
      llvm-svn: 266378
      3d1c1deb
    • Tom Stellard's avatar
      AMDGPU: Add skeleton GlobalIsel implementation · 000c5af3
      Tom Stellard authored
      Summary:
      This adds the necessary target code to be able to run the ir translator.
      Lowering function arguments and returns is a nop and there is no support
      for RegBankSelect.
      
      Reviewers: arsenm, qcolombet
      
      Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D19077
      
      llvm-svn: 266356
      000c5af3
    • Nicolai Haehnle's avatar
      AMDGPU: Remove SIFixSGPRLiveRanges pass · 723b73b4
      Nicolai Haehnle authored
      Summary:
      This pass is unnecessary and overly conservative. It was motivated by
      situations like
      
        def %vreg0:SGPR_32
        ...
      if-block:
        ..
        def %vreg1:SGPR_32
        ...
      else-block:
        ...
        use %vreg0:SGPR_32
        ...
      
      and similar situations with uses after the non-uniform control flow, where
      we are not allowed to assign %vreg0 and %vreg1 to the same physical register,
      even though in the original, thread/workitem-based CFG, it looks like the
      live ranges of these registers do not overlap.
      
      However, by the time register allocation runs, we have moved to a wave-based
      CFG that accurately represents the fact that the wave may run through both
      the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already
      overlap even without the SIFixSGPRLiveRanges pass.
      
      In addition to proving this change correct, I have tested it with Piglit
      and a small number of other tests.
      
      Reviewers: arsenm, tstellarAMD
      
      Subscribers: MatzeB, arsenm, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D19041
      
      llvm-svn: 266345
      723b73b4
  19. Mar 21, 2016
    • Nicolai Haehnle's avatar
      AMDGPU: Add SIWholeQuadMode pass · 213e87f2
      Nicolai Haehnle authored
      Summary:
      Whole quad mode is already enabled for pixel shaders that compute
      derivatives, but it must be suspended for instructions that cause a
      shader to have side effects (i.e. stores and atomics).
      
      This pass addresses the issue by storing the real (initial) live mask
      in a register, masking EXEC before instructions that require exact
      execution and (re-)enabling WQM where required.
      
      This pass is run before register coalescing so that we can use
      machine SSA for analysis.
      
      The changes in this patch expose a problem with the second machine
      scheduling pass: target independent instructions like COPY implicitly
      use EXEC when they operate on VGPRs, but this fact is not encoded in
      the MIR. This can lead to miscompilation because instructions are
      moved past changes to EXEC.
      
      This patch fixes the problem by adding use-implicit operands to
      target independent instructions. Some general codegen passes are
      relaxed to work with such implicit use operands.
      
      Reviewers: arsenm, tstellarAMD, mareko
      
      Subscribers: MatzeB, arsenm, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D18162
      
      llvm-svn: 263982
      213e87f2
  20. Mar 11, 2016
  21. Mar 03, 2016
    • Tom Stellard's avatar
      AMDGPU: Insert two S_NOP instructions for every high level source statement. · cc7067a6
      Tom Stellard authored
      Patch by: Konstantin Zhuravlyov
      
      Summary: Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input.
      
      Reviewers: tstellarAMD, arsenm
      
      Subscribers: echristo, dblaikie, arsenm, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D17454
      
      llvm-svn: 262579
      cc7067a6
  22. Feb 13, 2016
  23. Feb 12, 2016
  24. Feb 05, 2016
  25. Feb 02, 2016
  26. Jan 30, 2016
    • Matt Arsenault's avatar
      AMDGPU: Fix emitting invalid workitem intrinsics for HSA · e0132464
      Matt Arsenault authored
      The AMDGPUPromoteAlloca pass was emitting the read.local.size
      calls, which with HSA was incorrectly selected to reading from
      the offset mesa uses off of the kernarg pointer.
      
      Error on intrinsics which aren't supported by HSA, and start
      emitting the correct IR to read the workgroup size
      out of the dispatch pointer.
      
      Also initialize the pass so it can be tested with opt, and
      start moving towards not depending on the subtarget as an
      argument.
      
      Start emitting errors for the intrinsics not handled with HSA.
      
      llvm-svn: 259297
      e0132464
  27. Jan 27, 2016
    • Matt Arsenault's avatar
      AMDGPU: Fix default device handling · b22828f2
      Matt Arsenault authored
      When no device name is specified, default to kaveri
      for HSA since SI is not supported and it woud fail.
      
      Default to "tahiti" instead of "SI" since these are
      effectively the same, and tahiti is an actual device.
      
      Move default device handling to the TargetMachine
      rather than the AMDGPUSubtarget. The module ISA version
      is computed from the device name provided with the target
      machine, so the attributes printed by the AsmPrinter were
      inconsistent with those computed in the subtarget.
      
      Also remove DevName field from subtarget since it's redundant
      with getCPU() in the superclass.
      
      llvm-svn: 258901
      b22828f2
  28. Jan 21, 2016
Loading