In this chapter, we looked at our very first deep RL algorithm, DQN, which is probably the most popular RL algorithm in use today. We learned the theory behind a DQN, and also looked at the concept and use of target networks to stabilize training. We were also introduced to the Atari environment, which is the most popular environment suite for RL. In fact, many of the RL papers published today apply their algorithms to games from the Atari suite and report their episodic rewards, comparing them with corresponding values reported by other researchers who use other algorithms. So, the Atari environment is a natural suite of games to train RL agents and compare them to ascertain the robustness of algorithms. We also looked at the use of a replay buffer, and learned why it is used in off-policy algorithms.
This chapter has laid the foundation for us to delve deeper into deep...