[mlir:Async] Implement recursive async work splitting for scf.parallel...
[mlir:Async] Implement recursive async work splitting for scf.parallel operation (async-parallel-for pass) Depends On D104780 Recursive work splitting instead of sequential async tasks submission gives ~20%-30% speedup in microbenchmarks. Algorithm outline: 1. Collapse scf.parallel dimensions into a single dimension 2. Compute the block size for the parallel operations from the 1d problem size 3. Launch parallel tasks 4. Each parallel task reconstructs its own bounds in the original multi-dimensional iteration space 5. Each parallel task computes the original parallel operation body using scf.for loop nest Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D104850
Loading
Please sign in to comment