In this section, we try to test the interoperability between two different modules within the same Python program, namely, CuPy and Numba. So, we import the cuda module from numba and cupy, as well:
from numba import cuda #Using Numba
import cupy as cp #Using CuPy
from timeit import default_timer as timer
N = 500000000
@cuda.jit
def multiply(p, q):
# Thread id in a 1D block
tx = cuda.threadIdx.x
# Block id in a 1D grid
ty = cuda.blockIdx.x
# Number of threads per block
bw = cuda.blockDim.x
# Compute flattened index inside the array
index = tx + ty * bw
Like on our previous program, we compute the product based on a condition, as shown in the following code:
if index < N: # Check array size limit
q[index]=p[index]*q[index]
def main():
a_source = cp.zeros(N, dtype=cp.double...