[CUDA][FIX] Make shfl[_sync] for unsigned long long non-recursive
A copy-paste error caused UB in the definition of the unsigned long long versions of the shfl intrinsics. Reported and diagnosed by @trws. Differential Revision: https://reviews.llvm.org/D129536
Loading
Please sign in to comment