[mlir][sparse] scalarize reductions in for-loops during sparse codegen
Reductions in innermost loops become harder for the backend to disambiguate after bufferization into memrefs, resulting in less efficient load-update-store cycles. By scalarizing innermost reductions, the backend is more likely to assign a register to perform the reduction (also prepares vectorization). Even though we could scalarize reductions for more outer loops and while-loops as well, currently scalarization is only done for chains of innermost for-loops, where it matters most, to avoid complicating codegen unnecessary (viz. adding lots of yield instructions). This CL also refactors condition simplification into the merger class, where it belongs, so that conditions are simplified only once per loop nest and not repeatedly as was currently done. This CL also fixes a few minor bugs, some layout issues, and comments. Reviewed By: penpornk Differential Revision: https://reviews.llvm.org/D93143
Loading
Please sign in to comment