In this chapter, we had look into the background of RL and what a DQN is, including the Q-learning algorithm. We have seen how DQNs offer a unique (relative to the other architectures that we've discussed so far) approach to solving problems. We are not supplying output labels in the traditional sense as with, say, our CNN from Chapter 5, Next Word Prediction with Recurrent Neural Networks, which processed CIFAR image data. Indeed, our output label was a cumulative reward for a given action relative to an environment's state, so you may now see that we have dynamically created output labels. But instead of them being an end goal for our network, these labels help a virtual agent make intelligent decisions within a discrete space of possibilities. We also looked into what types of predictions we can make around rewards or actions.
Now you can think about other...