mlir/lib/Dialect/GPU/Transforms: improve context management in SerializeToCubin (#65779)
This commit adjusts the CUDA context management in the SerializeToCubin pass. In particular, it uses the device 0 primary context instead of creating a new CUDA context on each invocation of SerializeToCubin. This yields very large improvements in compile time, especially if an application (like a JIT compiler) is calling SerializeToCubin repeatedly. Differential Revision: https://reviews.llvm.org/D159487 Co-authored-by:Rohan Yadav <rohany@cs.stanford.edu>
Loading
Please sign in to comment