[mlir][nvvm] Add `cp.async.bulk.tensor.shared.cluster.global.multicast` (#72429)
This PR introduce `cp.async.bulk.tensor.shared.cluster.global.multicast` Op in NVVM dialect. It loads data using TMA data from global memory to shared memory of multiple CTAs in the cluster. It resolves #72368
Loading
Please sign in to comment