Transfer learning is especially useful when you want to solve a particular task and do not have enough training samples to properly train a performant model, but do have access to a larger and similar training dataset.
The model can be pretrained on this larger dataset until convergence (or, if available and pertinent, we can fetch an available pretrained model). Then, its final layers should be removed (when the target task is different, that is, its output differs from the pretraining task) and replaced with layers adapted to the target task. For example, imagine that we want to train a model to distinguish between pictures of bees and pictures of wasps. ImageNet contains images for these two classes, which could be used as a training dataset, but their number is not high enough for an efficient CNN to learn without overfitting. However, we could first train this network on the full ImageNet dataset to classify from the 1,000 categories...