In this chapter, we will finally learn how to debug and profile our GPU code using several different methods and tools. While we can easily debug pure Python code using IDEs such as Spyder and PyCharm, we can't use these tools to debug the actual GPU code, remembering that the GPU code itself is written in CUDA-C with PyCUDA providing an interface. The first and easiest method for debugging a CUDA kernel is the usage of printf statements, which we can actually call directly in the middle of a CUDA kernel to print to the standard output. We will see how to use printf in the context of CUDA and how to apply it effectively for debugging.
Next, we will fill in some of the gaps in our CUDA-C programming so that we can directly write CUDA programs within the NVIDIA Nsight IDE, which will allow us to make test cases in CUDA-C for some of the...