Inverse Autoregressive Flow
In our discussion earlier, it was noted that we want to use q(z|x) as a way to approximate the "true" p(z|x) that would allow us to generate an ideal encoding of the data, and thus sample from it to generate new images. So far, we've assumed that q(z|x) has a relatively simple distribution, such as a vector of Gaussian distribution random variables that are independent (a diagonal covariance matrix with 0s on the non-diagonal elements). This sort of distribution has many benefits; because it is simple, we have an easy way to generate new samples by drawing from random normal distributions, and because it is independent, we can separately tune each element of the latent vector z to influence parts of the output image.
However, such a simple distribution may not fit the desired output distribution of data well, increasing the KL divergence between p(z|x) and q(z|x). Is there a way we can keep the desirable properties of q(z|x) but "...