The dueling DQN
Before going ahead, let's learn about one of the most important functions in reinforcement learning, called the advantage function. The advantage function is defined as the difference between the Q function and the value function, and it is expressed as:
Okay, but what's the use of an advantage function? What does it signify? First, let's recall the Q function and the value function:
- Q function: The Q function gives the expected return an agent would obtain starting from state s, performing action a, and following the policy .
- Value function: The value function gives the expected return an agent would obtain starting from state s and following the policy .
Now if we think intuitively, what's the difference between the Q function and the value function? The Q function gives us the value of a state-action pair, while the value function gives the value of a state irrespective of the action. Now, the difference...