[AMDGPU] Reimplement the GFX11 early release VGPRs optimization
Implement this optimization in SIInsertWaitcnts, where we already have information about whether there might be outstanding VMEM store instructions. This has the following advantages: - Correctly handles atomics-with-return. - Correctly handles call instructions. - Should be faster because it does not require running a separate pass. Differential Revision: https://reviews.llvm.org/D153279
Loading
Please sign in to comment