We first saw how to query our GPU from PyCUDA, and with this re-create the CUDA deviceQuery program in Python. We then learned how to transfer NumPy arrays to and from the GPU's memory with the PyCUDA gpuarray class and its to_gpu and get functions. We got a feel for using gpuarray objects by observing how to use them to do basic calculations on the GPU, and we learned to do a little investigative work using IPython's prun profiler. We saw there is sometimes some arbitrary slowdown when running GPU functions from PyCUDA for the first time in a session, due to PyCUDA launching NVIDIA's nvcc compiler to compile inline CUDA C code. We then saw how to use the ElementwiseKernel function to compile and launch element-wise operations, which are automatically parallelized onto the GPU from Python. We did a brief review of functional programming in Python (in particular...




















































