The sample environments we have been running in this chapter use a form of recurrent memory by default to remember past sequences of events. This recurrent memory is constructed of Long Short-Term Memory (LSTM) layers that allow the agent to remember beneficial sequences that may encourage some amount of future reward. Remember that we extensively covered LSTM networks in Chapter 2, Convolutional and Recurrent Networks. For example, an agent may see the same sequence of frames repeatedly, perhaps moving toward the target goal, and then associate that sequence of states with an increased reward. A diagram showing the original form of this network, taken from the paper Training an Agent for FPS Doom Game using Visual Reinforcement Learning and VizDoom by Khan Aduil et al., is as follows:
DQRN Architecture
The authors referred to the network...