Commit 114ba722 authored Oct 12, 2022 by Manish Gupta

[mlir][NVGPU] Handle native mma.sync and ldmatrix(x4) sizes

This patch handles native `mma.sync` sizes and enables issuing `ldmatrix` on
largest possible tiles for matrixB. It requires handling
`vector.extract_strided_slice` from vector to ngpu lowering.

Differential Revision: https://reviews.llvm.org/D135749

parent 97196a2d

Show whitespace changes

Inline Side-by-side

Please to comment