At this point, we've covered much of the basic theory and practice of neural networks, but we haven't given much consideration to the processors running them. So let's take a break from coding and go into more depth about the little slices of silicon that are actually doing the work.
The 30,000-foot view is that CPUs were originally designed to favor scalar operations, which are performed sequentially, and GPUs are designed for vector operations, which are performed in parallel. Neural networks perform a large number of independent calculations within a layer (say, each neuron multiplied by its weight), and so they are a processing workload amenable to a chip design that favors massive parallelism.
Let's make this a little more concrete by walking through an example of the types of operations that take advantage of the performance characteristics...