[InstCombine] Take 2: Perform trivial PHI CSE
The original take was 6102310d, which taught InstSimplify to do that, which seemed better at time, since we got EarlyCSE support for free. However, it was proven that we can not do that there, the simplified-to PHI would not be reachable from the original PHI, and that is not something InstSimplify is allowed to do, as noted in the commit ed90f15e that reverted it : > It appears to cause compilation non-determinism and caused stage3 mismatches. However InstCombine already does many different optimizations, so it should be a safe place to do it here. Note that we still can't just compare incoming values ranges, because there is no guarantee that these PHI's we'd simplify to were already re-visited and sorted. However coming up with a test is problematic. Effects on vanilla llvm test-suite + RawSpeed: ``` | statistic name | baseline | proposed | Δ | % | |%| | |----------------------------------------------------|-----------|-----------|-------:|---------:|---------:| | instcombine.NumPHICSEs | 0 | 22228 | 22228 | 0.00% | 0.00% | | asm-printer.EmittedInsts | 7942329 | 7942456 | 127 | 0.00% | 0.00% | | assembler.ObjectBytes | 254295632 | 254313792 | 18160 | 0.01% | 0.01% | | early-cse.NumCSE | 2183283 | 2183272 | -11 | 0.00% | 0.00% | | early-cse.NumSimplify | 550105 | 541842 | -8263 | -1.50% | 1.50% | | instcombine.NumAggregateReconstructionsSimplified | 73 | 4506 | 4433 | 6072.60% | 6072.60% | | instcombine.NumCombined | 3640311 | 3666911 | 26600 | 0.73% | 0.73% | | instcombine.NumDeadInst | 1778204 | 1783318 | 5114 | 0.29% | 0.29% | | instcount.NumCallInst | 1758395 | 1758804 | 409 | 0.02% | 0.02% | | instcount.NumInvokeInst | 59478 | 59502 | 24 | 0.04% | 0.04% | | instcount.NumPHIInst | 330557 | 330549 | -8 | 0.00% | 0.00% | | instcount.TotalBlocks | 1077138 | 1077221 | 83 | 0.01% | 0.01% | | instcount.TotalFuncs | 101442 | 101441 | -1 | 0.00% | 0.00% | | instcount.TotalInsts | 8831946 | 8832611 | 665 | 0.01% | 0.01% | | simplifycfg.NumInvokes | 4300 | 4410 | 110 | 2.56% | 2.56% | | simplifycfg.NumSimpl | 1019813 | 999740 | -20073 | -1.97% | 1.97% | ``` So it fires ~22k times, which is less than ~24k the take 1 did. It allows foldAggregateConstructionIntoAggregateReuse() to actually work after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG would have done this PHI CSE, of all places. Additionally, allows some more `invoke`->`call` folds to happen (+110, +2.56%). All in all, expectedly, this catches less things overall, but all the motivational cases are still caught, so all good.
Loading
Please sign in to comment