This chapter provided an introduction to using Julia code on the GPU. It showed you how to install the relevant libraries and compile Julia to the GPU. Copying data to the GPU and processing it there is one of the most efficient ways to accelerate your code. If your code is amenable to high parallelization and can fit within the GPU's memory, then nothing comes close in terms of making your code fast. The whole deep learning revolution is proof of its success.
In the next chapter, we will go back to the CPU, and see how the tasks and asynchronous I/O can help accelerate your code.