Chapter 14. Multiarmed Bandits
This chapter is the first installment in our description of the reinforcement learning technique. In the context of a problem with multiple solutions, multiarmed bandit techniques attempt to acquire behavioral knowledge on many solutions (exploration) while at the same time applying the most rewarding solution (exploitation) to maximize success. The balancing act between experimenting and acquiring new knowledge and leveraging previously acquired knowledge is the core concept behind multiarmed bandit techniques.
This chapter covers the following topics:
- Exploration versus exploitation trade-off
- Minimization of cumulative regret
- Epsilon-greedy algorithm
- Upper confidence bound technique
- Context free Thompson sampling