In this recipe, we will look at a simple sound recognition problem on Google's Speech Commands dataset. We'll classify sound commands into different classes. We'll then set up a deep learning model and train it.
Getting ready
For this recipe, we'll need the librosa library as mentioned at the start of the chapter. We'll also need to download the Speech Commands dataset, and for that we'll need to install the wget library first:
!pip install wget
Alternatively, we could use the !wget system command in Linux and macOS. We'll create a new directory, download the archive with the dataset, and extract the tarfile:
import os
import wget
import tarfile
DATA_DIR = 'sound_commands'
DATASET_URL = 'http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz'
ARCHIVE = os.path.basename(DATASET_URL)
os.mkdir(DATA_DIR)
os.chdir(DATA_DIR)
wget.download(DATASET_URL)
with tarfile.open(ARCHIVE, 'r:gz') as tar:
tar...