Exploring curiosity-driven reinforcement learning
When we discussed the R2D2 agent, we mentioned that there were only few Atari games left in the benchmark set that the agent could not exceed the human performance in. The remaining challenge for the agent was to solve hard-exploration problems, which have very sparse and/or misleading rewards. Later work came out of Google DeepMind addressed those challenges as well, with agents called Never Give Up (NGU) and Agent57, reaching super-human level performance in all of the 57 games used in the benchmarks. In this section, we are going to discuss these agents and the methods they used for effective exploration.
Let's dive in by describing the concepts of hard-exploration and curiosity-driven learning.
Curiosity-driven learning for hard-exploration problems
Let's consider a simple grid world illustrated in Figure 13.7:
Assume the following...