In the Q-learning algorithm, the future maximum approximated action value is evaluated using the same Q function as the current stock selection policy. In some cases, this can overestimate the action values, slowing down learning. A variation called Double Q-learning was proposed by DeepMind researchers in the following paper: Deep reinforcement learning with Double Q-learning, H van Hasselt, A Guez, and D Silver, March, 2016, at the Thirtieth AAAI Conference on Artificial Intelligence. As a solution to this problem, the authors proposed to modify the Bellman update.
Deep reinforcement learning with double Q-learning
Getting ready
In this recipe, we will control an inverted pendulum system using the Double Q-learning algorithm...