Image processing with CNNs and ResNet50
In the world of deep learning, specific architectures have been developed to handle specific modalities. CNNs have been incredibly successful in processing images and are the standard architecture for CV tasks. A good mental model for using a pre-trained model for extracting features from images is that of using pre-trained word embeddings like GloVe for text. In this particular case, we use a specific architecture called ResNet50. While a comprehensive treatment of CNNs is outside the scope of this book, a brief overview of CNNs and ResNet will be provided in this section. If you are already comfortable with these concepts, you may skip ahead to the section titled Image feature extraction with ResNet50.
CNNs
CNNs are an architecture designed to learn from the following key properties, which are relevant to image recognition:
- Data locality: The pixels in an image are highly correlated to the pixels around them. ...