Skip to content
Commit 9d08f276 authored by Jay Foad's avatar Jay Foad
Browse files

[AMDGPU] Use reductions instead of scans in the atomic optimizer

If the result of an atomic operation is not used then it can be more
efficient to build a reduction across all lanes instead of a scan. Do
this for GFX10, where the permlanex16 instruction makes it viable. For
wave64 this saves a couple of dpp operations. For wave32 it saves one
readlane (which are generally bad for performance) and one dpp
operation.

Differential Revision: https://reviews.llvm.org/D98953
parent 69d01e0e
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment