Deep Q-Network (DQN)
Using the Q-Table to implement Q-Learning is fine in small discrete environments. However, when the environment has numerous states or continuous as in most cases, a Q-Table is not feasible or practical. For example, if we are observing a state made of four continuous variables, the size of the table is infinite. Even if we attempt to discretize the four variables into 1000 values each, the total number of rows in the table is a staggering 10004 = 1e12. Even after training, the table is sparse - most of the cells in this table are zero.
A solution to this problem is called DQN [2] which uses a deep neural network to approximate the Q-Table. As shown in Figure 9.6.1. There are two approaches to build the Q-network:
- The input is the state-action pair, and the prediction is the Q value
- The input is the state, and the prediction is the Q value for each action
The first option is not optimal since the network will be called a number of times equal to the number of...