In this section, we'll summarize the steps used in the well-written TensorFlow Simple Audio Recognition tutorial (https://www.tensorflow.org/versions/master/tutorials/audio_recognition) and also add a few tips that can be of help to you while training the model.
The simple speech commands recognition model we'll build will be able to recognize 10 words: "yes," "no," "up," "down," "left," "right," "on," "off," "stop," and "go"; it can also detect silence. If it finds no silence and none of the 10 words, it'll generate "unknown." The speech commands dataset we'll download and use for training the model, when we run the tensorflow/example/speech_commands/train.py script later, actually contains 20 more words...