Let's consider two probabilistic events, A and B. We can correlate the marginal probabilities P(A) and P(B) with the conditional probabilities P(A|B) and P(B|A), using the product rule:
![](https://static.packt-cdn.com/products/9781789347999/graphics/assets/11f65b5b-df64-4600-8059-97f6befbd34d.png)
Considering that the intersection is commutative, the first members are equal, so we can derive Bayes' theorem:
![](https://static.packt-cdn.com/products/9781789347999/graphics/assets/9baeae5c-f042-427a-83a1-7de56e7bc494.png)
In the general discrete case, the formula can be re-expressed considering all possible outcomes for the random variable A:
![](https://static.packt-cdn.com/products/9781789347999/graphics/assets/2837efd2-6c93-4aab-9408-fa14d96d0f76.png)
As the denominator is a normalization factor, the formula is often expressed as a proportionality relationship:
![](https://static.packt-cdn.com/products/9781789347999/graphics/assets/67427b4f-99fe-4051-982a-b6db6aad39e4.png)
This formula has very deep philosophical implications, and it's a fundamental element of statistical learning. First of all, let's consider the marginal probability, P(A). This is normally a value that determines how probable a target event is, such as P(Spam) or P(Rain). As there are no other elements, this kind...