We have looked at some really interesting case studies on applying transfer learning to real-world problems in the previous chapters. Image and text data are two forms of unstructured data that we have tackled previously. We have demonstrated various ways to apply transfer learning to get more robust and superior models, and also to tackle constraints such as having less training data. In this chapter, we will tackle the new real-world problem of identifying and classifying audio events.
Creating pretrained deep learning models for audio data is a huge challenge because we do not have the advantage of efficient pretrained visual models such as the VGG or Inception (available for image data) or word-embedding based models such as Word2vec or GloVe (available for text data). The question then might arise as to what might be our strategy...