AMDGPU: Break read2/write2 search range on a memory fence
This is to fix performance regressions introduced by 86c944d7. The old search would collect all potentially mergeable instructions in the entire block. In this case, the same address is written in multiple places in the block on the other side of a fence. When sorted by offset, the two unmergeable, identical addresses would be next to each other and the merge would give up. Break the search space when we encounter an instruction we won't be able to merge across. This will keep the identical addresses in different merge attempts. This may also improve compile time by reducing the merge list size.
Loading
Please sign in to comment