Image feature extraction with ResNet50
ResNet50 models are trained on the ImageNet dataset. This dataset contains millions of images in over 20,000 categories. The large-scale visual recognition challenge, ILSVRC, focuses on the top 1,000 categories for models to compete on recognizing images. Consequently, the top layers of the ResNet50 that perform classification have a dimension of 1,000. The idea behind using a pre-trained ResNet50 model is that it is already able to parse out objects that may be useful in image captioning.
The tensorflow.keras.applications
package provides pre-trained models like ResNet50. At the time of writing, all the pre-trained models provided are related to CV. Loading up the pre-trained model is quite easy. All the code for this section is in the feature-extraction.py
file in this chapter's folder on GitHub. The main reason for using a separate file is that it gives us the ability to run feature extraction as a script.
Given that we will...