Deep Q learning leverages deep learning networks in learning the Q value function. Illustrated in the following diagram, Figure 9.3, is the architecture of a deep Q learning network:
The diagram learns to map every pair of states (s, a) and actions into an output Q value output Q(s, a), while in the diagram on the right, for every state s, we learn Q values pertaining to every action a. If there are n possible actions for every state, the output of the network produces n outputs Q(s, a1), Q(s, a2), . . . . . . Q(s, an).
The deep Q learning networks are trained with a very simple idea called experience replay. Let the RL agent interact with the environment and store experience in the tuple form of (s, a, r, s′) in a replay buffer. Mini-batches can be sampled from this replay buffer to train the network. In the...