In this chapter, we are going to learn how to convert audio to text using the WaveNet model. We will then build a model that will take audio and convert it into text using an Android application.
This chapter is based on the WaveNet: A Generative Model for Raw Audio paper, by Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. You can find this paper at https://arxiv.org/abs/1609.03499.
In this chapter, we will cover the following topics:
- WaveNet and how it works
- The WaveNet architecture
- Building a model using WaveNet
- Preprocessing datasets
- Training the WaveNet network
- Transforming a speech WAV file into English text
- Building an Android application
Let's dig deeper into what Wavenet actually is.