10. Conclusion
In this chapter, we discussed MI and the ways in which it can be useful in solving unsupervised tasks. Various online resources provide additional background about MI [4]. When used in clustering, maximizing MI forces the latent code vectors to cluster in regions that are suitable for easy labeling, either using linear assignment or a linear classifier.
We presented two measures of MI: IIC and MINE. We can closely approximate MI that leads to a classifier that performs with high accuracy by using IIC on discrete random variables. IIC is suitable for discrete probability distributions. For continuous random variables, MINE uses the Donsker-Varadhan form of KL-divergence to model a deep neural network that estimates MI. We demonstrated that MINE can closely approximate the MI of a bivariate Gaussian distribution. As an unsupervised method, MINE shows acceptable performance on classifying MNIST digits.