It is important to understand the working principle of an Automatic Speech Recognition (ASR) system before discussing the useful DL models.
DL for sound/audio recognition in IoT
ASR system model
An Automatic Speech Recognition (ASR) system needs three main sources of knowledge. These sources are known as an acoustic model, a phonetic lexicon, and a language model [4]. Generally, an acoustic model deals with the sounds of language, including the phonemes and extra sounds (such as pauses, breathing, background noise, and so on). On the other hand, a phonetic lexicon model or dictionary includes the words that can be understood by the system, with their possible pronunciations. Finally, a language model includes knowledge about...