Over the last few years, the field of TTS has been shaken by several deep learning-based breakthroughs. Here, we will present two of them: WaveNet (probably the most notorious one) and Tacotron (which has the advantage of being an end-to-end approach).
TTS in deep learning
WaveNet, in brief
The WaveNet paper was presented in 2016, and it showed results that outperformed the classical TTS approaches. Basically, WaveNet is an audio generative model. It takes a sequence of audio samples as input and predicts the most likely following audio sample. By adding an extra input, it can be conditioned to accomplish more tasks. For instance, if the audio transcript is additionally provided during training, WaveNet can turn it into a...