Commit 4d977174 authored Aug 29, 2023 by Slava Zakharin

[flang] Improved performance of runtime Matmul/MatmulTranspose.

This patch mostly affects performance of the code produced by
HLIFR lowering. If MATMUL argument is an array slice, then
HLFIR lowering passes the slice to the runtime, whereas
FIR lowering would create a contiguous temporary for the slice.
Performance might be better than the generic implementation
for cases where the leading dimension is contiguous.
This patch improves CPU2000/178.galgel making HLFIR version
faster than FIR version (due to avoiding the temporary copies
for MATMUL arguments).

Reviewed By: klausler

Differential Revision: https://reviews.llvm.org/D159134

parent 8f48392b

Show whitespace changes

Inline Side-by-side

Please to comment