Conclusion
In this chapter, we've been introduced to DRL. A powerful technique believed by many researchers as the most promising lead towards artificial intelligence. Together, we've gone over the principles of RL. RL is able to solve many toy problems, but the Q-Table is unable to scale to more complex real-world problems. The solution is to learn the Q-Table using a deep neural network. However, training deep neural networks on RL is highly unstable due to sample correlation and non-stationarity of the target Q-Network.
DQN proposed a solution to these problems using experience replay and separating the target network from the Q-Network under training. DDQN suggested further improvement of the algorithm by separating the action selection from action evaluation to minimize the overestimation of Q value. There are other improvements proposed for the DQN. Prioritized experience replay [6] argues that that experience buffer should not be sampled uniformly. Instead, experiences that...