In this chapter, we will dive deeper into the topic of multi-armed bandits. We touched on the basics of how they work in Chapter 1, Brushing Up on Reinforcement Learning Concepts, and we'll go over some of the conclusions we reached there. We'll extend our knowledge of the exploration-versus-exploitation process that we learned from our study of Q-learning and apply it to other optimization problems using Q-values and exploration-based strategies.
We will do the following in this chapter:
- Understand a simple bandit problem in greater detail
- Learn the effect of adding a state to the description of the bandit's environment
- Become familiar with more advanced bandit-type research problems in various fields
- Understand the benefits of bandit-type implementations in experimental trials using A/B testing
- Discuss...