1. Principles of VAE
In a generative model, we're often interested in approximating the true distribution of our inputs using neural networks:
In the preceding equation, represents the parameters determined during training. For example, in the context of the celebrity faces dataset, this is equivalent to finding a distribution that can draw faces. Similarly, in the MNIST dataset, this distribution can generate recognizable handwritten digits.
In machine learning, to perform a certain level of inference, we're interested in finding , a joint distribution between inputs, , and latent variables, . The latent variables are not part of the dataset but instead encode certain properties observable from inputs. In the context of celebrity faces, these might be facial expressions, hairstyles, hair color, gender, and so on. In the MNIST dataset, the latent variables may represent the digit and writing styles.
is practically a distribution...