[flang] Use Assign() runtime for copy-in/copy-out.
The loops generated under IsContiguous check for copy-in/copy-out result in LLVM backend spending too much time optimizing them. At the same time, the copy loops do not provide any optimization opportunities with the surrounding code (since they are executed under runtime IsContiguous check), so the copy code may be optimized on its own and this can be done in runtime. I thought I could implement and use new APIs for packing/unpacking non-contiguous data (interfaces added in D136378), but then I found that Assign() is already doing what is needed. If performance becomes an issue for these loops, we can optimize code in Assign() rather than creating new APIs. Thus, this change makes use of Assign() for copy-in/copy-out of boxed objects, and this is done only if the objects are non-contiguous during execution. Copies for non-boxed objects (e.g. for passing as VALUE dummy argument) are still done inline, because they can potentially be optimized with surrounding loops. I added internal -inline-copyinout-for-boxes option to revert to the old behavior just to make it easier to triage performance regressions, if any appear after the change. CPU2017/521.wrf compiles for 2179 seconds without the change and the module_dm.f90 compiled with -O0 (without -O0 this single module compiles for 5775 seconds). With the change total compilation time of the benchmark reduces to 722 seconds. Differential Revision: https://reviews.llvm.org/D140446
Loading
Please sign in to comment