[mlir][nvgpu] Add nvgpu.tma.async.load and nvgpu.tma.descriptor
This work adds `nvgpu.tma.async.load` Op that requests tma load asyncronusly using mbarrier object. It also creates nvgpu.tma.descriptor type. The type is supposed be created by `cuTensorMapEncodeTiled` cuda drivers api. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D155453
Loading
Please sign in to comment