DDQN is an extension to DQN, where we use the target network in the Bellman update. Specifically, in DDQN, we evaluate the target network's Q function using the action that would be greedy maximization of the primary network's Q function. First, we will use the vanilla DQN target for the Bellman equation update step, then, we will extend to DDQN for the same Bellman equation update step; this is the crux of the DDQN algorithm. We will then code DDQN in TensorFlow to play Atari Breakout. Finally, we will compare and contrast the two algorithms: DQN and DDQN.
Understanding Double DQN
Updating the Bellman equation
In vanilla DQN, the target for the Bellman update is this:
θt represents the model parameters...