A glance at deep Q-learning
In the previous code, we saw an implementation of the popular Q-learning algorithm for the grid world example. This example consisted of a discrete state space of size 30, where it was sufficient to store the Q-values in a Python dictionary.
However, we should note that sometimes the number of states can get very large, possibly almost infinitely large. Also, we may be dealing with a continuous state space instead of working with discrete states. Moreover, some states may not be visited at all during training, which can be problematic when generalizing the agent to deal with such unseen states later.
To address these problems, instead of representing the value function in a tabular format like V(St), or Q(St, At), for the action-value function, we use a function approximation approach. Here, we define a parametric function, vw(xs), that can learn to approximate the true value function, that is, , where xs is a set of input features (or “...