Commit 52a2d07b authored May 10, 2023 by pvanhout

[AMDGPU] Improve PHI-breaking heuristics in CGP

D147786 made the transform more conservative by adding heuristics,
which was a good idea. However, the transform got a bit
too conservative at times.

This caused a surprise in some rocRAND benchmarks because D143731 greatly helped a few of them.
For instance, a few xorwow-uniform tests saw a +30% boost in performance after that pass, which was lost when D147786 landed.

This patch is an attempt at reaching a middleground that makes
the pass a bit more permissive. It continues in the same spirit as
D147786 but does the following changes:
- PHI users of a PHI node are now recursively checked. When loops are encountered, we consider the PHIs non-breakable. (Considering them breakable had very negative effect in one app I tested)
-  `shufflevector` is now considered interesting, given that it satisfies a few trivial checks.

Reviewed By: arsenm, #amdgpu, jmmartinez

Differential Revision: https://reviews.llvm.org/D150266

parent bcbd9b02

Expand all Show whitespace changes

Inline Side-by-side

Please to comment