The name distributional RL can be a bit misleading and may conjure up images of multilayer distributed networks of DQN all working together. Well, that indeed may be a description of distributed RL, but distribution RL is where we try and find the value distribution that DQN is predicting, that is, not just find the maximum or mean value but understanding the data distribution that generated it. This is quite similar to both intuition and purpose for PG methods. We do this by projecting our known or previously predicted distribution into a future or future predicted distribution.
This definitely requires us to review a code example, so open Chapter_10_QRDQN.py and follow the next exercise:
- The entire code listing is too big to drop here, so we will look at sections of importance. We will start with the QRDQN or Quantile Regressive DQN. Quantile regression...