Commit 2c573967 authored Jul 11, 2023 by Guray Ozen

[mlir][nvgpu] Implement `nvgpu.device_async_copy` by NVVMToLLVM Pass

`nvgpu.device_async_copy` is lowered into `cp.async` PTX instruction. However, NVPTX backend does not support its all mode especially when zero padding is needed. Therefore, current MLIR implementation genereates inline assembly for that.

This work simplifies PTX generation for `nvgpu.device_async_copy`, and implements it by `NVVMToLLVM` Pass.

Depends on D154060

Reviewed By: nicolasvasilache, manishucsd

Differential Revision: https://reviews.llvm.org/D154345

parent dd080c75

Show whitespace changes

Inline Side-by-side

Please to comment