AMDGPU/SIInsertWait: Skip dummy tied source
For D16 memory load instructions, the hardware usually only write to half of the 32bit register, but we define the destination register using 32bit register for the MachineIR instruction. Without the extra tied source register, LLVM framework will think previous write to the other half of the register being dead. This is because by using 32bit register as the destination register, LLVM will think the instruction will always overwrite the whole 32bit register. By adding the extra tied source, LLVM will think we are reading the register, so previous write to the register will not be dead. This dummy tied source is introducing unnecessary read-after-write dependency. The change here is to bypass the tied source that can be skipped, thus avoiding an unnecessary s_waitcnt. Reviewed by: foad Differential Revision: https://reviews.llvm.org/D140537
Loading
Please sign in to comment