As the number of states in a Q-learning task increases, a simple Q-table is no longer a practical way of modeling the state-action transition function. Instead, we will use a Q-network, which is a type of neural network that is designed to approximate Q-values.
Approximating Q-values allows us to build a model of a Q-learning task that maps states to actions. In this chapter, we will discuss how neural networks can be used to recognize states and map these to actions, which allows us to approximate Q-values instead of using a lookup table.
We'll understand what a policy agent is in comparison to a value agent, which we implemented in the previous chapter. In addition to discussing how the network we build adjusts to model the problem that we're working with, we'll also learn more about Q-networks at a higher level.
We will cover...