Automatic speech recognition has a lot of potential applications, such as audio transcription, dictation, audio search, and virtual assistants. I am sure that everyone has interacted with at least one of the virtual assistants by now, be it Apple's Siri, Amazon's Alexa, or Google's Assistant. At the core of all these speech recognition systems are a set of statistical models over the different words or sounds in a language. And since speech has a temporal structure, HMMs are the most natural framework to model it.
HMMs are virtually at the core of all speech recognition systems and the core concepts in modeling haven't changed much in a long time. But over time, a lot of sophisticated techniques have been developed to build better systems. In the following sections, we will try to cover the main concepts leading to the development...