By now, you are capable of approaching RL problems in a systematic and concise way. You are able to design and develop RL algorithms specifically for the problem at hand and get the most from the environment. Moreover, in the previous two chapters, you learned about algorithms that go beyond RL, but that can be used to solve the same set of tasks.
At the beginning of this chapter, we'll present a dilemma that we have already encountered in many of the previous chapters; namely, the exploration-exploitation dilemma. We have already presented potential solutions for the dilemma throughout the book (such as the -greedy strategy), but we want to give you a more comprehensive outlook on the problem, and a more concise view of the algorithms that solve it. Many of them, such as the upper confidence bound (UCB) algorithm, are more sophisticated and...