[CUDA] Implemented __nvvm_atom_*_gen_* builtins.
Integer variants are implemented as atomicrmw or cmpxchg instructions. Atomic add for floating point (__nvvm_atom_add_gen_f()) is implemented as a call to an overloaded @llvm.nvvm.atomic.load.add.f32.* LVVM intrinsic. Differential Revision: http://reviews.llvm.org/D10666 llvm-svn: 240669
Loading
Please sign in to comment