[LTO] Ensure LICM hoists expensive fdiv instructions introduced by InstCombine
In the LTO pipeline we run InstCombine after LICM, which is different to what we normally do without LTO. This has the effect of undoing all the great work done by LICM to reduce the cost of the loop when it hoists the fdiv out and replaces it with fmul. When InstCombine runs after LICM it puts the fdiv straight back which, on AArch64 at least, is darn expensive. You can observe this problem in the SPEC2017 benchmark parest if you build with "-Ofast -flto" and the loop-vectoriser uses an unroll factor of 1, which is what often happens when tail-folding is enabled. This is also a problem for scalar loops, or indeed any loop where there is only one use of the preheader fdiv result in the loop. See InstCombinerImpl::visitFMul for the code that sinks the fdiv. I've attempted to fix this by adding another LICM pass for Full LTO after InstCombine. The alternative is to stop InstCombine from sinking the fdiv into loops. See D87479 for a previous discussion on this issue. Differential Revision: https://reviews.llvm.org/D143631
Loading
Please sign in to comment