Thompson sampling
Thompson sampling is a simple strategy, introduced 80 years ago, that has received renewed attention in recent years. It is wildly used in advertising displays, marketing surveys, and financial analysis. Thompson sampling is also a Bayesian strategy, known as probability matching: The probability of selecting the arm n is the probability that n is the arm with the maximum reward [14:4].
The strategy can be summarized as:
- Assign a uniform distribution for each arm, prior to the selection
- Select arm n with a posterior probability that increases with the probability that n is optimal (probability matching)
Bandit context
So far, we have discussed K-armed bandits that do not maintain a state or context. It is assumed that all the arms are identical and only parameterized by their mean reward (successes and failures in the case of Bernoulli bandits). However, real-world applications, such as product recommendations or advertising targeting, require arms (a product or advertising...