- Implement standard Q-learning with a different policy (Boltzman) on an Atari environment and examine the difference in performance metrics
- Implement a Double DQN on the same problem and compare the difference in performance
- Implement a Dueling DQN for the same problem and compare the difference in performance
Exercise
Limits of Q-learning
It is truly remarkable how a relatively simple algorithm as such can give rise to complex strategies that such agents can come up with, given enough training time. Notably, researchers (and now, you too) are able to show how expert strategies may be learned through enough interaction with the environment. In the classic game of breakout, for example (included as an environment in the Atari...