We saw in the A kernel call section that we can start multiple blocks and multiple threads in parallel. So, in which order do these blocks and threads start and finish their execution? It is important to know this if we want to use the output of one thread in other threads. To understand this, we have modified the kernel in the hello,PyCUDA! program, seen in the earlier section, by including a print statement in a kernel call, which prints the block number. The modified code is shown as follows:
import pycuda.driver as drv
import pycuda.autoinit
from pycuda.compiler import SourceModule
mod = SourceModule("""
#include <stdio.h>
__global__ void myfirst_kernel()
{
printf("I am in block no: %d \\n", blockIdx.x);
}
""")
function = mod.get_function("myfirst_kernel")
function(grid=(4,1),block...