[OpenMP] Ensure memory fences are created with barriers for AMDGPUs
It turns out that the `__builtin_amdgcn_s_barrier()` alone does not emit a fence. We somehow got away with this and assumed it would work as it (hopefully) is correct on the NVIDIA path where we just emit a `__syncthreads`. After talking to @arsenm we now (mostly) align with the OpenCL barrier implementation [1] and emit explicit fences for AMDGPUs. It seems this was the underlying cause for #59759, but I am not 100% certain. There is a chance this simply hides the problem. Fixes: https://github.com/llvm/llvm-project/issues/59759 [1] https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/07b347366eb2c6ebc3414af323c623cbbbafc854/opencl/src/workgroup/wgbarrier.cl#L21 Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D145290
Loading
Please sign in to comment