Dueling network architecture
We know that the Q function specifies how good it is for an agent to perform an action a in the state s and the value function specifies how good it is for an agent to be in a state s. Now we introduce a new function called an advantage function which can be defined as the difference between the value function and the advantage function. The advantage function specifies how good it is for an agent to perform an action a compared to other actions.
Thus, the value function specifies the goodness of a state and the advantage function specifies the goodness of an action. What would happen if we were to combine the value function and advantage function? It would tell us how good it is for an agent to perform an action a in a state s that is actually our Q function. So we can define our Q function as a sum of a value function and an advantage function, as in
.
Now we will see how the dueling network architecture works. The following diagram shows the architecture of...