Summary
In this chapter, we covered a lot of new and complex material. You became familiar with the limitations of value iteration in complex environments with large observation spaces, and we discussed how to overcome them with Q-learning. We checked the Q-learning algorithm on the FrozenLake environment and discussed the approximation of Q-values with NNs, as well as the extra complications that arise from this approximation.
We covered several tricks with DQNs to improve their training stability and convergence, such as an experience replay buffer, target networks, and frame stacking. Finally, we combined those extensions into one single implementation of DQN that solves the Pong environment from the Atari games suite.
In the next chapter, we will take a quick look at higher-level RL libraries, and after that, we will take a look at a set of tricks that researchers have found since 2015 to improve DQN convergence and quality, which (combined) can produce state-of...