The VAE is a particular type of autoencoder (Kingma, D. P., & Welling, M. (2013)). It learns specific statistical properties of the dataset derived from a Bayesian approach. First, let's define as the prior probability density function of a random latent variable,
. Then, we can describe a conditional probability density function,
, which can be interpreted as a model that can produce data—say,
. It follows that we can approximate the posterior probability density function in terms of the conditional and prior distributions, as follows:
![](https://static.packt-cdn.com/products/9781838640859/graphics/assets/ab9cec77-1f80-41d4-96d0-67a90fd91a22.png)
It turns out that an exact posterior is intractable, but this problem can be solved, approximately, by making a few assumptions and using an interesting idea to compute gradients. To begin with, the prior can be assumed to follow an isotropic Gaussian distribution, . We can also assume that the conditional distribution,
, can be parametrized and modeled using a neural network; that is, given a latent vector
, we...