The deep recurrent Q network
The deep recurrent Q network (DRQN) is just the same as a DQN but with recurrent layers. But what's the use of recurrent layers in DQN? To answer this question, first, let's understand the problem called Partially Observable Markov Decision Process (POMDP).
An environment is called a POMDP when we have a limited set of information available about the environment. So far, in the previous chapters, we have seen a fully observable MDP where we know all possible actions and states—although we might be unaware of transition and reward probabilities, we had complete knowledge of the environment. For example, in the frozen lake environment, we had complete knowledge of all the states and actions of the environment.
But most real-world environments are only partially observable; we cannot see all the states. For instance, consider an agent learning to walk in a real-world environment. In this case, the agent will not have complete...