Implementing the Dueling Double DQN algorithm and DDDQN agent
Dueling Double DQN (DDDQN) combines the benefits of both Double Q-learning and Dueling architecture. Double Q-learning corrects DQN from overestimating the action values. The Dueling architecture uses a modified architecture to separately learn the state value function (V) and the advantage function (A). This explicit separation allows the algorithm to learn faster, especially when there are many actions to choose from and when the actions are very similar to each other. The dueling architecture enables the agent to learn even when only one action in a state has been taken, as it can update and estimate the state value function, unlike the DQN agent, which cannot learn from actions that were not taken yet. By the end of this recipe, you will have a complete implementation of the DDDQN agent.
Getting ready
To complete this recipe, you will first need to activate the tf2rl-cookbook
Conda Python virtual environment and...