[mlir] Add map_nested_foreach_thread_to_gpu_blocks op to transform dialect
This revision adds a new op `map_nested_foreach_thread_to_gpu_blocks` to transform dialect. If `generate_gpu_launch` argument is given, the op first generates `gpu_launch`. Otherwise, `target` must be `gpu_launch`. The op searches top level `scf.foreach_threads` inside the `gpu_launch` and distributes them with gpu.block_id attribute. Loop mapping is explicit and given by the map_nested_foreach_thread_to_gpu_blocks op. Mapping is done one-to-one, therefore the loops disappear. It also adds `gpu dialect` as dependent since the new op can create `gpu::LaunchOp` for given `scf::ForeachThreadOp`. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D134190
Loading
Please sign in to comment