Skip to content
Unverified Commit d20fbc90 authored by Guray Ozen's avatar Guray Ozen Committed by GitHub
Browse files

[MLIR][NVGPU] Introduce `nvgpu.wargroup.mma.store` Op for Hopper GPUs (#65441)

This PR introduces a new Op called `warpgroup.mma.store` to the NVGPU
dialect of MLIR. The purpose of this operation is to facilitate storing
fragmanted result(s) `nvgpu.warpgroup.accumulator` produced by
`warpgroup.mma` to the given memref.

An example of fragmentated matrix is given here :

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#wgmma-64n16-d

The `warpgroup.mma.store` does followings:
1) Takes one or more `nvgpu.warpgroup.accumulator` type (fragmented
results matrix)
2) Calculates indexes per thread in warp-group and stores the data into
give memref.

Here's an example usage:
```
// A warpgroup performs GEMM, results in fragmented matrix
%result1, %result2 = nvgpu.warpgroup.mma ...

// Stores the fragmented result to memref
nvgpu.warpgroup.mma.store [%result1, %result2], %matrixD : 
    !nvgpu.warpgroup.accumulator< fragmented = vector<64x128xf32>>,
    !nvgpu.warpgroup.accumulator< fragmented = vector<64x128xf32>> 
    to memref<128x128xf32,3>
```
parent c7d6d622
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment