Data preprocessing is an essential step for a DL pipeline. The speech commands dataset consists of 1-second .wav files for each short speech command, and these files only need to be converted into a spectrum image. However, the downloaded audio files for the second use case are not uniform in length; hence, they require two-step preprocessing:
- .mp3 to uniform length (such as a 5-second length) WAV file conversion
- .wav file to spectrum image conversion.
The preprocessing of the datasets is discussed in the data collection section. A few issues to be noted during the training image set preparation are as follows:
- Data Size: We need to collect at least a hundred images for each class in order to train a model that works well. The more we can gather, the better the accuracy of the trained model is likely to be. Each of the categories in the use case one dataset...