There are several improvements and additions to deep Q-networks that are worth exploring broadly here. We won't be working with these algorithms directly in this chapter, but this is a good starting point for finding ways to improve the performance of the DQNs you've built so far.
Building further on DQNs
Calculating DQN loss
Calculating the prediction loss in a DQN (also called the TD or Temporal Difference error) is a matter of finding the difference between the true Q-value of a state-action pair and the value estimated by the network. We then backpropagate the loss to the earlier nodes in the network to update their weights.
The issue we run into is that we don't actually know the true Q-value of a state...