The computation graph in PyTorch
Our first examples won’t be around speeding up the baseline, but will show one common, and not always obvious, situation that can cost you performance. In Chapter 3, we discussed the way PyTorch calculates gradients: it builds the graph of all operations that you perform on tensors, and when you call the backward() method of the final loss, all gradients in the model parameters are automatically calculated.
This works well, but RL code is normally much more complex than traditional supervised learning training, so the RL model that we are currently training is also being applied to get the actions that the agent needs to perform in the environment. The target network discussed in Chapter 6 makes it even more tricky. So, in DQN, a neural network (NN) is normally used in three different situations:
-
When we want to calculate Q-values predicted by the network to get the loss in respect to reference Q...