Chapter 14 – Distributional Reinforcement Learning
- In a distributional RL, instead of selecting an action based on the expected return, we select the action based on the distribution of the return, which is often called the value distribution or return distribution.
- In categorical DQN, we feed the state and support of the distribution as the input and the network returns the probabilities of the value distribution.
- The authors of the categorical DQN suggest that it will be efficient to choose the number of support N as 51 and so the categorical DQN is also known as the C51 algorithm.
- Inverse CDF is also known as the quantile function. Inverse CDF as the name suggests is the inverse of the cumulative distribution function. That is, in CDF, given the support x, we obtain the cumulative probability , whereas in inverse CDF, given cumulative probability , we obtain the support x.
- In a categorical DQN, along with the state, we feed the fixed support...