References
- Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver, 2017, Rainbow: Combining Improvements in Deep Reinforcement Learning. arXiv:1710.02298
- Sutton, R.S. 1988, Learning to Predict by the Methods of Temporal Differences, Machine Learning 3(1):9-44
- Hado Van Hasselt, Arthur Guez, David Silver, 2015, Deep Reinforcement Learning with Double Q-Learning. arXiv:1509.06461v3
- Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Pilot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg, 2017, Noisy Networks for Exploration arXiv:1706.10295v1
- Marc Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaus, David Saxton, Remi Munos 2016, Unifying Count-Based Exploration and Intrinsic Motivation arXiv:1606.01868v2
- Jarryd Martin, Suraj Narayanan Sasikumar, Tom Everitt, Marcus Hutter, 2017, Count-Based Exploration in Feature Space for Reinforcement...