Bayes’ theorem
When we learned about maximum likelihood for estimating the parameters of a model, it felt like an intuitively sensible thing to do. Who can argue with the idea of choosing the model parameters so that we have the highest possible probability of obtaining the data we have actually observed? But we didn’t really derive maximum likelihood in any formal way. Yes, choosing parameters by maximizing ) seems reasonable, but aren’t we really interested in the probability of the parameters given the data, that is | Data)? Working with the likelihood seems close to what we want, but not quite there. If only there was a way we could calculate from . There is. Enter Bayes’ theorem.
This section will be relatively short as we will only introduce Bayes’ theorem here. In the next two sections, we will explain how Bayes’ theorem is used in practice.
Conditional probability and Bayes’ theorem
Bayes’ theorem, named after...