GPGPU and multithreading
Combining multithreaded code with GPGPU can be much easier than trying to manage a parallel application running on an MPI cluster. This is mostly due to the following workflow:
- Prepare data: Readying the data which we want to process, such as a large set of images, or a single large image, by sending it to the GPU's memory.
- Prepare kernel: Loading the OpenCL kernel file and compiling it into an OpenCL kernel.
- Execute kernel: Send the kernel to the GPU and instruct it to start processing data.
- Read data: Once we know the processing has finished, or a specific intermediate state has been reached, we will read a buffer we passed along as an argument with the OpenCL kernel in order to obtain our result(s).
As this is an asynchronous process, one can treat this as a fire-and-forget operation, merely having a single thread dedicated to monitoring the process of the active kernels.
The biggest challenge in terms of multithreading and GPGPU applications lies not with the host...