[MLIR][Linalg] introduce batch-reduce GEMM
The batch-reduce GEMM kernel essentially multiplies a sequence of input tensor blocks (which form a batch) and the partial multiplication results are reduced into a single output tensor block. See: https://ieeexplore.ieee.org/document/9139809 for more details. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D134163
Loading
Please sign in to comment