[LLVM][NVPTX] Add cp.async.bulk.commit/wait intrinsics (#78698)
This patch adds NVVM intrinsics and NVPTX codegen for the bulk variants of the async-copy commit/wait instructions. lit tests are added to verify the generated PTX. PTX Doc link: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-commit-group Signed-off-by:Durgadoss R <durgadossr@nvidia.com>
Loading
Please sign in to comment