So far in this book, we have been taking the term thread for granted. Let's step back for a moment and see exactly what this means—a thread is a sequence of instructions that is executed on a single core of the GPU—cores and threads should not be thought of as synonymous! In fact, it is possible to launch kernels that use many more threads than there are cores on the GPU. This is because, similar to how an Intel chip may only have four cores and yet be running hundreds of processes and thousands of threads within Linux or Windows, the operating system's scheduler can switch between these tasks rapidly, giving the appearance that they are running simultaneously. The GPU handles threads in a similar way, allowing for seamless computation over tens of thousands of threads.
Multiple threads are executed on the GPU in abstract units...