In this chapter, we looked at the Hello World of DRL, the DQN algorithm, and applying DL to RL. We first looked at why we need DL in order to tackle more complex continuous observation state environments like CartPole and LunarLander. Then we looked at the more common DL environments you may use for DL and the one we use, PyTorch. From there, we installed PyTorch and set up an example using computational graphs as a low-level neural network. Following that, we built a second example with the PyTorch neural network interface in order to see the difference between a raw computational graph and neural network.
With that knowledge, we then jumped in and explored DQN in detail. We looked at how DQN uses experience replay or a replay buffer to replay events when training the network/policy in DQN. As well, we looked at how the TD loss was calculated based on the difference...