Deep learning architectures are data greedy and having a few samples in a training set will not get us the best out of them. TL solves this problem by transferring learned or gained knowledge/representations from solving a task with a large dataset to another different but similar one with a smaller dataset.
TL is not only useful for the case of small training sets, but also we can use it to make the training process faster. Training large deep learning architectures from scratch can sometimes be very slow because we have millions of weights in these architectures that need to be learned. Instead, someone can make use of TL by just fine-tuning a learned weight on a similar problem to the one that he/she's trying to solve.