7. Unsupervised learning by maximizing the Mutual Information of continuous random variables
In previous sections, we learned that we can arrive at a good estimator of the MI of discrete random variables. We also demonstrated that with the help of a linear assignment algorithm, a network that performs clustering by maximizing MI leads to an accurate classifier.
If IIC is a good estimator of the MI of discrete random variables, what about continuous random variables? In this section, we discuss the Mutual Information Network Estimator (MINE) by Belghazi et al. [3] as an estimator of the MI of continuous random variables.
MINE proposes an alternative expression of KL-divergence in Equation 13.1.1 to implement an MI estimator using a neural network. In MINE, the Donsker-Varadhan (DV) representation of KL-divergence is used:
Where the supremum is taken all over the space of function T. T is an arbitrary function that maps from the input space...