Sharing GPU Memory¶
Vineyard supports sharing both CPU memory and GPU memory between different processes and different compute engines. The sharing of GPU memory is archived by using the CUDA IPC mechanism and provides a flexible unified memory interfaces.
CUDA IPC and Unified Memory¶
The CUDA IPC memory handle allows GPU memory to be shared between different
processes via IPC. In vineyard, the GPU memory is allocated by the vineyardd
instance when CreateGPUBuffer()
, then an IPC handle is transferred to the
client process and the GPU memory can be accessed by the client process after
calling cudaIpcOpenMemHandle()
. For readers, the GPU memory can be accessed
like a normal CPU shared memory object with GetGPUBuffers()
.
Like CUDA unified memory, vineyard’s provides a unified memory interface which can be adapted to different kinds of implementation (GPU, PPU, etc.) as the abstraction to share GPU memory between different processes, as well as sharing memory between the host and device.
The unified memory abstraction is able to automatically synchronize the memory between host and devices by leverage the RAII mechanism of C++.
Example¶
Note
The GPU shared memory is still under development and the APIs may change in the future.
Creating a GPU buffer:
ObjectID object_id; Payload object; std::shared_ptr<MutableBuffer> buffer = nullptr; RETURN_ON_ERROR(client.CreateGPUBuffer(data_size(), object_id, object, buffer)); CHECK(!buffer->is_cpu()); CHECK(buffer->is_mutable());
The result buffer’s data
buffer->mutable_data()
is a GPU memory pointer, which can be directly passed to GPU kernels, e.g.,printKernel<<<1, 1>>>(buffer->data());
Composing the buffer content from host code like Unified Memory:
{ CUDABufferMirror mirror(*buffer, false); memcpy(mirror.mutable_data(), "hello world", 12); }
Here the
mirror
’sdata()
andmutable_data()
are host memory pointers allocated using thecudaHostAlloc()
API. WhenCUDABufferMirror
destructing, the host memory will be copied back to the GPU memory automatically.The second argument of
CUDABufferMirror
indicates whether the initial memory of the GPU buffer needs to be copied to the host memory. Defaults tofalse
.Accessing the GPU buffer from another process:
ObjectID object_id = ...; std::shared_ptr<Buffer> buffer = nullptr; RETURN_ON_ERROR(client.GetGPUBuffer(object_id, true, buffer)); CHECK(!buffer->is_cpu()); CHECK(!buffer->is_mutable());
The result buffer’s data
buffer->data()
is a GPU memory pointer, which can be directly passed to GPU kernels, e.g.,printKernel<<<1, 1>>>(buffer->data());
Accessing the shared GPU buffer from CPU:
{ CUDABufferMirror mirror(*buffer, true); printf("CPU data from GPU is: %s\n", reinterpret_cast<const char*>(mirror.data())); }
Using the
CUDABufferMirror
to access the GPU buffer from CPU, the mirror’sdata()
is a host memory pointer allocated using thecudaHostAlloc()
API. For immutableBuffer
, the second argument ofCUDABufferMirror
must betrue
, and the GPU memory will be copied to the host memory when the mirror is constructed.Freeing the shared GPU buffer:
ObjectID object_id = ...; RETURN_ON_ERROR(client.DelData(object_id));
For complete example about GPU memory sharing, please refer to gpumalloc_test.cu