The Deep Q Networks (DQN) are based on Q-learning. In this section, we will explain both of them before we implement the DQN in Keras to play the PacMan game.
- Q-learning: In Q-learning, the agent learns the action-value function, also known as the Q-function. The Q function denoted with q(s,a) is used to estimate the long-term value of taking an action a when the agent is in state s. The Q function maps the state-action pairs to the estimates of long-term values, as shown in the following equation:
Thus, under a policy, the q-value function can be written as follows:
The q function can be recursively written as follows:
The expectation can be expanded as follows:
An optimal q function is the one that returns the maximum value, and an optimal policy is the one that applies the optimal q function. The optimal q function can be...