In this section, we will look at the theory behind a DQN, including the math behind it, and learn the use of neural networks to evaluate the value function.
Previously, we looked at Q-learning, where Q(s,a) was stored and evaluated as a multi-dimensional array, with one entry for each state-action pair. This worked well for grid-world and cliff-walking problems, both of which are low-dimensional in both state and action spaces. So, can we apply this to higher dimensional problems? Well, no, due to the curse of dimensionality, which makes it unfeasible to store very large number states and actions. Moreover, in continuous control problems, the actions vary as a real number in a bounded range, although an infinite number of real numbers are possible, which cannot be represented as a tabular Q array. This gave rise to function approximations in RL...