Speech to text with Amazon Transcribe
In the previous section, you learned about text to speech. In this section, you will learn about speech to text and the service that provides it: Amazon Transcribe. It is an automatic speech recognition service that uses pre-trained deep learning models, which means that you do not have to train on petabytes of data to produce a model; Amazon does this for us. You just have to use the APIs that are available to transcribe audio files or video files; it supports a number of different languages and custom vocabulary too. Accuracy is the key, and through custom vocabulary, you can enhance it based on the desired domain or industry:
Figure 8.10 – Block diagram of Amazon Transcribe’s input and output
Some common uses of Amazon Transcribe include the following:
- Real-time audio streaming and transcription
- Transcripting pre-recorded audio files
- Enable text searching from a media file by combining...