Likelihood
To understand the probability distribution that the data follows, we’ll look at an explicit example of how a random component is incorporated into data.
A simple probabilistic model
We’ll start with the simplest way in which we can introduce a random component into our observations of the response (target) variable , namely by adding noise to a deterministic quantity. In fact, we’ll just consider the observations in our dataset to be noise-corrupted versions of a model output . So, we have this relationship:
Eq. 1
Here, is the noise value that has been added to the model output to get the observation for the datapoint. The value is a random variable. Without loss of generality, we can assume its expectation value is zero, so we have . We can make this assumption because if the expectation of was non-zero, it would mean we have a non-zero deterministic average contribution from that we could just absorb into the definition of ...