In Chapter 14, End-to-End Learning, we learned about transcribing handwritten text images into text. In this section, we will be leveraging a similar end-to-end model to transcribe voices into text.
Transcribing audio into text
Getting ready
The strategy that we'll adopt to transcribe voices is as follows:
- Download a dataset that contains the audio file and its corresponding transcriptions (ground truths)
- Specify a sampling rate while reading the audio files:
- If the sampling rate is 16,000, we'll be extracting 16,000 data points per second of audio.
- Extract a Fast Fourier Transformation of the audio array:
- An FFT ensures that we have only the most important features of a signal.
- By default, the FFT gives...