Combining everything
You have now seen all the DQN improvements mentioned in the paper Rainbow: Combining Improvements in Deep Reinforcement Learning, but it was done in an incremental way, which (I hope) was helpful to understand the idea and implementation of every improvement. The main point of the paper was to combine those improvements and check the results. In the final example, I’ve decided to exclude categorical DQN and double DQN from the final system, as they haven’t shown too much improvement on our guinea pig environment. If you want, you can add them and try using a different game. The complete example is available in Chapter08/08_dqn_rainbow.py.
First of all, we need to define our network architecture and the methods that have contributed to it:
-
Dueling DQN: Our network will have two separate paths for the value of the state distribution and advantage distribution. On the output, both paths will be summed together, providing...