Data collection for ASR is a challenging task for many reasons, including privacy. Consequently, open source datasets are limited in number. Importantly, these datasets may not be easy to access, may have insufficient data/speakers, or may be noisy. In this context, we decided to use two different datasets for the two use cases. For the voice-driven controlled smart light, we are using Google’s speech command datasets, and for use case two, we can scrap data from one of three popular open data sources, LibriVox, LibriSpeech ASR, corpus, voxceleb, and YouTube.
Google's speech command dataset includes 65,000 one-second long utterances of 30 short words, contributed to by thousands of different members of the public through the AIY website. The dataset offers basic audio data on common words such as On, Off, Yes, digits, and directions, but this can be...