[AMDGPU] Improve PHI-breaking heuristics in CGP
D147786 made the transform more conservative by adding heuristics, which was a good idea. However, the transform got a bit too conservative at times. This caused a surprise in some rocRAND benchmarks because D143731 greatly helped a few of them. For instance, a few xorwow-uniform tests saw a +30% boost in performance after that pass, which was lost when D147786 landed. This patch is an attempt at reaching a middleground that makes the pass a bit more permissive. It continues in the same spirit as D147786 but does the following changes: - PHI users of a PHI node are now recursively checked. When loops are encountered, we consider the PHIs non-breakable. (Considering them breakable had very negative effect in one app I tested) - `shufflevector` is now considered interesting, given that it satisfies a few trivial checks. Reviewed By: arsenm, #amdgpu, jmmartinez Differential Revision: https://reviews.llvm.org/D150266
Loading
Please sign in to comment