[MLIR][GPU] Add NvGpu mma.sync path to the VectorToGPU pass
This changes adds the option to lower to NvGpu dialect ops during the VectorToGPU convsersion pass. Because this transformation reuses existing VectorToGPU logic, a seperate VectorToNvGpu conversion pass is not created. The option `use-nvgpu` is added to the VectorToGPU pass. When this is true, the pass will attempt to convert slices rooted at `vector.contract` operations into `nvgpu.mma.sync` ops, and `vector.transfer_read` ops are converted to either `nvgpu.ldmatrix` or one or more `vector.load` operations. The specific data loaded will depend on the thread id within a subgroup (warp). These index calculations depend on data type and shape of the MMA op according to the downstream PTX specification. The code for supporting these details is separated into `NvGpuSupport.cpp|h`. Differential Revision: https://reviews.llvm.org/D122940
Loading
Please sign in to comment