Skip to content
Commit fa07bb64 authored by Artem Belevich's avatar Artem Belevich
Browse files

[CUDA] Provide integer SIMD functions for CUDA-9.2

CUDA-9.2 made all integer SIMD functions into compiler builtins,
so clang no longer has access to the implementation of these
functions in either headers of libdevice and has to provide
its own implementation.

This is mostly a 1:1 mapping to a corresponding PTX instructions
with an exception of vhadd2/vhadd4 that don't have an equivalent
instruction and had to be implemented with a bit hack.

Performance of this implementation will be suboptimal for SM_50
and newer GPUs where PTXAS generates noticeably worse code for
the SIMD instructions compared to the code it generates
for the inline assembly generated by nvcc (or used to come
with CUDA headers).

Differential Revision: https://reviews.llvm.org/D49274

llvm-svn: 337587
parent 5e729dcc
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment