[mlir][NVGPU] Handle native mma.sync and ldmatrix(x4) sizes
This patch handles native `mma.sync` sizes and enables issuing `ldmatrix` on largest possible tiles for matrixB. It requires handling `vector.extract_strided_slice` from vector to ngpu lowering. Differential Revision: https://reviews.llvm.org/D135749
Loading
Please sign in to comment