[mlir] Implement DestinationStyleOpInterface for scf::ForallOp (#66981)
`scf::ForallOp` has `shared_outs` tensor operands which are used to insert partial results into in the parallel terminator. The `scf::ForallOp` returns one tensor for each `shared_out` which then contains the combined result from all threads. Since the parallel terminator cannot change the shape of the `shared_out`, ForallOp is a `DestinationStyleOp` and this patch implements the interface by declaring the `outputs` operands as `inits` in the language of the DPS interface. For this change to work, we need to add an exception to the Pattern that folds `tensor.cast` Ops into DPS Ops because `scf::Forall` needs special handling of its `BlockArgument` Type during this folding.
Loading
Please sign in to comment