Summary
We started the chapter by understanding how distributional reinforcement learning works. We learned that in distributional reinforcement learning, instead of selecting an action based on the expected return, we select the action based on the distribution of return, which is often called the value distribution or return distribution.
Next, we learned about the categorical DQN algorithm, also known as C51, where we feed the state and support of the distribution as the input and the network returns the probabilities of the value distribution. We also learned how the projection step matches the support of the target and predicted the value distribution so that we can apply the cross entropy loss.
Going ahead, we learned about quantile regression DQNs, where we feed the state and also the equally divided cumulative probabilities as input to the network and it returns the support value of the distribution.
At the end of the chapter, we learned about how D4PG...