We are now ready to start working towards building our audio event classifier. We have our base feature maps, but we still need to do some more feature engineering. You can always build a CNN from scratch to ingest these images and then connect it to a fully connected deep multilayer perceptron (MLP) to build a classifier. However, we will be leveraging the power of transfer learning here by using one of the pretrained models for feature extraction. To be more specific, we will be using the VGG-16 model as a feature extractor and then train a fully-connected deep network on these features.
Audio event classification with transfer learning
Building datasets from base features
The first step is to load our base features and...