Skip to content
Commit 86888e42 authored by Aart Bik's avatar Aart Bik
Browse files

[mlir][sparse][gpu] generate proper memcpy in/out host and device

The host registration is a convenient way to get CUDA kernels
running, but it may be slow and does not work for all buffer
(like global constants). This revision uses the proper alloc
copy dealloc chains for buffers, using asynchronous chains
to increase overlap. The host registration mechanism is
kept under a flag for the output, just for experimentation
purposes while this project ramps up.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D148682
parent 851a1213
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment