We will end with a brief overview of the command-line Nvidia nvprof profiler. In contrast to the Nsight IDE, we can freely use any Python code that we have written—we won't be compelled here to write full-on, pure CUDA-C test function code.
We can do a basic profiling of a binary executable program with the nvprof program command; we can likewise profile a Python script by using the python command as the first argument, and the script as the second as follows: nvprof python program.py. Let's profile the simple matrix-multiplication CUDA-C executable program that we wrote earlier, with nvprof matrix_ker:
We see that this is very similar to the output of the Python cProfiler module that we first used to analyze a Mandelbrot algorithm way back in Chapter 1, Why GPU Programming?—only now, this exclusively...