Understanding transfer learning and fine-tuning
In the previous chapters, we saw how we could leverage MXNet, GluonCV, and GluonNLP to retrieve pre-trained models in certain datasets (such as ImageNet, MS COCO, and IWSLT2015) and use them for our specific tasks and datasets.
In this recipe, we will introduce a methodology called transfer learning, which will allow us to combine the information from pre-trained models (on general knowledge datasets) and the information from the new domain (the dataset from the task we want to solve). There are two main significant advantages to this approach. On the one hand, pre-training datasets are typically large-scale (ImageNet-22k has 14 million images), and using a pre-trained model saves us that training time. On the other hand, we use our specific dataset not only for evaluation but also for training the model, improving its performance in the desired scenario. As we will discover, there is not always an easy way to achieve this, as it requires...