In this chapter, we'll be looking into audio generation. We'll first provide an overview of WaveNet, an existing model for audio generation, especially efficient in text-to-speech applications. In Magenta, we'll use NSynth, a WaveNet autoencoder model, to generate small audio clips that can serve as instruments for a backing MIDI score. NSynth also enables audio transformations such as scaling, time stretching, and interpolation. We'll also use GANSynth, a faster approach based on Generative Adversarial Network (GAN).
The following topics will be covered in this chapter:
- Learning about WaveNet and temporal structures for music
- Neural audio synthesis with NSynth
- Using GANSynth as a generative instrument