[mlir][nvgpu] Implement `nvgpu.device_async_copy` by NVVMToLLVM Pass
`nvgpu.device_async_copy` is lowered into `cp.async` PTX instruction. However, NVPTX backend does not support its all mode especially when zero padding is needed. Therefore, current MLIR implementation genereates inline assembly for that. This work simplifies PTX generation for `nvgpu.device_async_copy`, and implements it by `NVVMToLLVM` Pass. Depends on D154060 Reviewed By: nicolasvasilache, manishucsd Differential Revision: https://reviews.llvm.org/D154345
Loading
Please sign in to comment