Commit 86888e42 authored Apr 18, 2023 by Aart Bik

[mlir][sparse][gpu] generate proper memcpy in/out host and device

The host registration is a convenient way to get CUDA kernels
running, but it may be slow and does not work for all buffer
(like global constants). This revision uses the proper alloc
copy dealloc chains for buffers, using asynchronous chains
to increase overlap. The host registration mechanism is
kept under a flag for the output, just for experimentation
purposes while this project ramps up.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D148682

parent 851a1213

Show whitespace changes

Inline Side-by-side

Please to comment