The VAE is a particular type of autoencoder (Kingma, D. P., & Welling, M. (2013)). It learns specific statistical properties of the dataset derived from a Bayesian approach. First, let's define as the prior probability density function of a random latent variable, . Then, we can describe a conditional probability density function, , which can be interpreted as a model that can produce data—say, . It follows that we can approximate the posterior probability density function in terms of the conditional and prior distributions, as follows:
It turns out that an exact posterior is intractable, but this problem can be solved, approximately, by making a few assumptions and using an interesting idea to compute gradients. To begin with, the prior can be assumed to follow an isotropic Gaussian distribution, . We can also assume that the conditional distribution, , can be parametrized and modeled using a neural network; that is, given a latent vector , we...