The technology of speech recognition
The following are the two main stages in speech recognition:
Signal processing: This stage involves capturing the words spoken into a microphone and using an analogue-to-digital converter (ADC) to translate it into digital data that can be processed by the computer. The ADC processes the digital data to remove noise and perform other processes such as echo cancellation in order to be able to extract those features that are relevant for speech recognition.
Speech recognition: The signal is split into minute segments that are matched against the phonemes of the language to be recognized. Phonemes are the smallest unit of speech, roughly equivalent to the letters of the alphabet. For example, the phonemes in the word cat are /k/, /æ/, and /t/. In English, for example, there are around 40 phonemes, depending on which variety of English is being spoken.
The most successful approach to speech recognition has been to model speech statistically so that the outcome...