We will be using the free-spoken digits audio dataset from https://github.com/Jakobovski/free-spoken-digit-dataset/tree/master/recordings for our basic model. Download the data to any directory on your system. In the example code, replace the path referring to the .wav file with the path you have copied the data to.
Note that we have split the data into training data which includes 1,470 files and 30 for the test set.
Before we get into the details of the model itself, we will look at how to prepare it for the training. The most common preprocessing step used in practice is to transform the raw audio data into its frequency spectrum. The frequency spectrum or power spectrum is like a fingerprint for the data in which the raw audio is broken into constituent parts or frequencies. This representation helps in identifying which frequencies...