Skip to content
Commit 8b2ca766 authored by Matt Arsenault's avatar Matt Arsenault
Browse files

AMDGPU: Reserve v32 if we may need to copy between AGPRs on gfx908

We need to guarantee cheap copies between AGPRs, and unfortunately
gfx908 cannot directly do this. Theoretically we could set the
scavenger up with an emergency spill slot, but it also feels
unreasonable to pay that cost for what was assumed to be a simple and
cheap copy. Pick a register that doesn't conflict with any ABI
registers.

This does not address the same issue when copying from SGPR to AGPR
for gfx90a (this coincidentally fixes it for gfx908), but that's less
interesting since the register allocator shouldn't be proactively
introducing such copies.

One edge case I'm worried about is respecting the VGPR budget implied
by amdgpu-waves-per-eu. If the theoretical upper bound of a function
is 32 VGPRs, this will force the actual count to be 33.

This is also broken if inline assembly uses/defs something in v32. The
coalescer will eliminate the intermediate vreg between the def and
use, and the introduced copy will clobber the user value.

(cherry picked from commit 3335784ac2d587ff4eac04586e189532ae8b2607)
parent a7f60bfd
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment