[LLVM][AMDGPU] Specialize 32-bit atomic fadd instruction for generic address space
The 32-bit floating-point atomic add instructions on AMDGPUs does not support a "flat" or "generic" address space. So, if the address space cannot be determined statically, the AMDGPU backend will fall back to a CAS loop (which does support "flat" addressing). Instead, this patch emits runtime address-space checks to allow native FP atomic add instructions for global and LDS memory (and non-atomic FP add instructions for private/scratch memory). In order to do that, this patch introduces a new interface function `emitExpandAtomicRMW`. It is expected to be called when a common atomic expand doesn't work for a specific target, such as the case we discussed here. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129690
Loading
Please sign in to comment