Summary
In this chapter, we learned about CNNs and their main components. We started with the convolution operation and looked at 1D and 2D implementations. Then, we covered another type of layer that is found in several common CNN architectures: the subsampling or so-called pooling layers. We primarily focused on the two most common forms of pooling: max-pooling and average-pooling.
Next, putting all these individual concepts together, we implemented deep CNNs using the torch.nn
module. The first network we implemented was applied to the already familiar MNIST handwritten digit recognition problem.
Then, we implemented a second CNN on a more complex dataset consisting of face images and trained the CNN for smile classification. Along the way, you also learned about data augmentation and different transformations that we can apply to face images using the torchvision.transforms
module.
In the next chapter, we will move on to recurrent neural networks (RNNs). RNNs are used...