The approximation of Q-values using neural networks with one sample at a time is not very stable. You will recall that, in FA, we incorporated experience replay to improve stability. Similarly, in this recipe, we will apply experience replay to DQNs.
With experience replay, we store the agent's experiences (an experience is composed of an old state, a new state, an action, and a reward) during episodes in a training session in a memory queue. Every time we gain sufficient experience, batches of experiences are randomly sampled from the memory and are used to train the neural network. Learning with experience replay becomes two phases: gaining experience, and updating models based on the past experiences randomly selected. Otherwise, the model will keep learning from the most recent experience and the neural network model could get stuck...