Mutual information
In this section, we’re going to look at information-theoretic concepts relating to multiple random variables. We’ll focus on the case of just two random variables, and , but you’ll soon realize that the new calculations and concepts we’ll introduce generalize easily to more than two random variables. As usual, we’ll start with the discrete case first before introducing the continuous case later.
Because we’re looking at two discrete random variables, and , we’ll need their joint probability distribution, , which we’ll use as shorthand for . The joint distribution is just a probability distribution so we can easily measure its entropy, which we’ll denote by . Applying the usual rules that entropy is expected information, is given by the following formula:
Eq. 17
To understand Eq. 17, let’s look at what would happen if and were independent of each other. The joint distribution would...