Skip to content
Commit 2c573967 authored by Guray Ozen's avatar Guray Ozen
Browse files

[mlir][nvgpu] Implement `nvgpu.device_async_copy` by NVVMToLLVM Pass

`nvgpu.device_async_copy` is lowered into `cp.async` PTX instruction. However, NVPTX backend does not support its all mode especially when zero padding is needed. Therefore, current MLIR implementation genereates inline assembly for that.

This work simplifies PTX generation for `nvgpu.device_async_copy`, and implements it by `NVVMToLLVM` Pass.

Depends on D154060

Reviewed By: nicolasvasilache, manishucsd

Differential Revision: https://reviews.llvm.org/D154345
parent dd080c75
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment