In this chapter, we built a classifier that can predict whether an image contains a cat or a dog by using two different CNNs. We first went through the theory behind CNNs, and we understood that the fundamental building blocks of a CNN are the convolution, pooling, and fully connected layers. In particular, the front of the CNN consists of a block of convolution-pooling layers, repeated an arbitrary number of times. This block is responsible for identifying spatial characteristics in the images, which can be used to classify the images. The back of the CNN consists of fully connected layers, similar to an MLP. This block is responsible for making the final predictions.
In the first CNN, we used a basic architecture that achieved 80% accuracy on the testing set. This basic CNN consists of two convolutional-max pooling layers, followed by two fully connected layers. In the...