Make ptx barrier work irrespective of the cl_mem_fence_flags
This generates a "bar.sync 0” instruction, which not only causes the threads to wait, but does acts as a memory fence, as required by OpenCL. The fence does not differentiate between local and global memory. Unfortunately, there is no similar instruction which does not include a memory fence. Hence, we cannot optimize the case where neither CLK_LOCAL_MEM_FENCE nor CLK_GLOBAL_MEM_FENCE is passed. llvm-svn: 315228
Loading
Please register or sign in to comment