[CUDA] Provide integer SIMD functions for CUDA-9.2
CUDA-9.2 made all integer SIMD functions into compiler builtins, so clang no longer has access to the implementation of these functions in either headers of libdevice and has to provide its own implementation. This is mostly a 1:1 mapping to a corresponding PTX instructions with an exception of vhadd2/vhadd4 that don't have an equivalent instruction and had to be implemented with a bit hack. Performance of this implementation will be suboptimal for SM_50 and newer GPUs where PTXAS generates noticeably worse code for the SIMD instructions compared to the code it generates for the inline assembly generated by nvcc (or used to come with CUDA headers). Differential Revision: https://reviews.llvm.org/D49274 llvm-svn: 337587
Loading
Please sign in to comment