Summary
In this chapter, we've concluded our discussion on bandit problems with contextual bandits. As we mentioned, bandit problems have many practical applications. So, it would not be a surprise if you already had a problem in your business or research that can be modeled as a bandit problem. Now that you know how to formulate and solve one, go out and apply what you have learned! Bandit problems are also important to develop intuition on how to solve exploration-exploitation dilemma, which will exist in almost every RL setting.
Now that you have a solid understanding of how to solve one-step RL, it is time to move on to full-blown multi-step RL. In the next chapter, we will go into the theory behind multi-step RL with Markov Decision Processes, and build the foundation for modern deep RL methods that we will cover in the subsequent chapters.