Switch from shfl.bfly to shfl.down.
Both work for the current use case, but the latter allows implementing prefix sums and is a little easier to understand for partial warps. PiperOrigin-RevId: 285145287
Loading
Please sign in to comment
Both work for the current use case, but the latter allows implementing prefix sums and is a little easier to understand for partial warps. PiperOrigin-RevId: 285145287