- You have to handle 16,000 samples per second (at least) and keep track of the general structure at a bigger time scale.
- NSynth is a WaveNet-style autoencoder that learns its own temporal embedding, making it possible to capture long term structure, and providing access to a useful hidden space.
- The colors in the rainbowgram are the 16 dimensions of the temporal embedding.
- Check the timestretch method in the audio_utils.py file in the chapter's code.
- GANSynth uses upsampling convolutions, making the training and generation processing in parallel possible for the entire audio sample.
-
You need to sample the random normal distribution using np.random.normal(size=[10, 256]), where 10 is the number of sampled instruments, and 256 is the size of the latent vector (given by the latent_vector_size configuration).