Thompson Sampling
The algorithms we have seen so far make up a set of diverse insights: Greedy and its variants mostly focus on exploitation and might need to be explicitly forced to employ exploration; UCB, on the other hand, tends to be optimistic about the true expected reward of under-explored arms and therefore naturally, but also justifiably, focuses on exploration.
Thompson Sampling also uses a completely different intuition. However, before we can understand the idea behind the algorithm, we need to discuss one of its principal building blocks: the concept of Bayesian probability.
Introduction to Bayesian Probability
Generally speaking, the workflow of using Bayesian probability to describe a quantity consists of the following elements:
- A prior probability representing whatever prior knowledge or belief we have about the quantity.
- A likelihood probability that denotes, as the name of the term suggests, how likely the data that we have observed so far is...