Writing massively parallel code for NVIDIA graphics cards (GPUs) with CUDA
Graphics Processing Units (GPUs) are powerful processors specialized in real-time rendering. We find GPUs in virtually any computer, laptop, video game console, tablet, or smartphone. Their massively parallel architecture comprises tens to thousands of cores. The video game industry has been fostering the development of increasingly powerful GPUs over the last two decades.
GPUs are routinely used in modern supercomputers (for example in Cray's Titan at Oak Ridge National Laboratory, ~20 petaFLOPS, ~20,000 CPUs, and as many NVIDIA GPUs). A high-end $1000 GPU today is roughly as powerful as a $100 million supercomputer from 2000 (several teraFLOPS).
Note
FLOPS means FLoating-point Operations Per Second. A 1 teraFLOPS GPU can perform up to one trillion floating-point operations per second.
Since the mid-2000s, GPUs are no longer limited to graphics processing. We can now implement scientific algorithms on a GPU. The only...