In this section, we will learn how to build a CNN to classify images of the MNIST dataset. In the previous chapter, we saw that a simple softmax model provides about 92% classification accuracy for recognizing hand written digits in the MNIST.
Here we'll implement a CNN which has a classification accuracy of about 99%.
The following figure shows how the data flows in the first two convolutional layer. The input image is processed in the first convolutional layer using the filter-weights. This results in 32 new images, one for each filter in the convolutional layer. The images are also downsampled with the pooling operation so the image resolution is decreased from 28x28 to 14x14.
These 32 smaller images are then processed in the second convolutional layer. We need filter weights again for each of these 32 features, and we need filter-weights for each output channel of this layer. The...