Speech recognition is the task in which a machine or computer transforms spoken language into text. A good example is the voice typing feature in Google Docs which converts speech to text as you speak. In this chapter, we will look into how to build such systems using deep learning models. In particular, we will focus on using recurrent neural network (RNNs) models as these are found to be effective in practice for speech recognition. This is because RNNs can capture temporal dependencies in the speech data that is important in the task of converting it into text.
The following is an overview of the topics that will be covered in this chapter:
- Overview of speech recognition
- RNN models for isolated spoken word recognition
- Speech recognition using the DeepSpeech model for continuous speech