[mlir][NVGPU][transform] Add `create_async_groups` transform op
This transform looks for suitable vector transfers from global memory to shared memory and converts them to async device copies. Differential Revision: https://reviews.llvm.org/D155569
Loading
Please sign in to comment