[NVPTX] improve lowering for common byte-extraction operations. (#66945)
Some critical code paths we have depend on efficient byte extraction from data loaded as integers. By default LLVM tries to extract bytes by storing/loading from stack, which is very inefficient on GPU.
Loading
Please sign in to comment