So far, we've worked under the assumption that the state- and action- value functions are tabular. However, in tasks with large value spaces, such as computer games, it's impossible to store all possible values in a table. Instead, we'll try to approximate the value functions. To formalize this, let's think of the tabular value functions, and , as actual functions with as many parameters as the number of table cells. As the state space grows, so does the number of parameters, to the point where it becomes impossible to store them. Not only that, but with a large number of states, the agent is bound to enter situations it has never seen before.
Our goal then is to find another set of functions, and , with the following properties:
- Approximates and with significantly fewer parameters, compared to the tabular version
- Generalizes...