[libc][gpu] Thread divergence fix on volta
The inbox/outbox loads are performed by the current warp, not a single thread. The outbox load indicates whether a port has been successfully opened. If some lanes in the warp think it has and others think the port open failed, as the warp happened to be diverged when the load occurred, all the subsequent control flow will be incorrect. The inbox load indicates whether the machine on the other side of the RPC channel has progressed. If lanes in the warp have different ideas about that, some will try to progress their state transition while others won't. As far as the RPC layer is concerned this is a performance problem and not a correctness one - none of the lanes can start the transition early, only miss it and start late - but in practice the calls layered on top of RPC do not have the interface required to detect this event and retry the load on the stalled lanes, so the calls layered on top will be broken. None of this is broken on amdgpu, but it's likely that the readfirstlane will have beneficial performance properties there. Possible significant enough that it's worth landing this ahead of fixing gpu::broadcast_value on volta. Essentially volta wasn't adequately considered when writing this part of the protocol. It's a bug present in the initial prototype and propagated thus far, because none of the test cases push volta into a warp diverged state in the middle of the RPC sequence. We should have some test cases for volta where port_open and equivalent are called from diverged warps. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D159276
Loading
Please sign in to comment