- Sutton, R. S., Barto, A. G. (2018). RL: An Introduction. The MIT Press.
- Tesauro, G. (1992). Practical issues in temporal difference learning. ML 8, 257–277.
- Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Commun. ACM 38, 3, 58-68.Â
- Silver, D. (2018). Success Stories of Deep RL. Retrieved from
- Crites, R. H., Barto, A.G. (1995). Improving elevator performance using RL. In Proceedings of the 8th International Conference on Neural Information Processing Systems (NIPS'95).
- Mnih, V. et al. (2015). Human-level control through deep RL. Nature, 518(7540), 529–533.
- Silver, D. et al. (2018). A general RL algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419), 1140–1144.
- Vinyals, O. et al. (2019). Grandmaster level in StarCraft II using multi-agent RL.
- OpenAI. (2018). OpenAI Five. Retrieved from
- Heess, N. et al. (2017). Emergence of Locomotion Behaviours in Rich Environments. ArXiv, abs/1707.02286.
- OpenAI et al. (2018). Learning Dexterous In-Hand Manipulation. ArXiv, abs/1808.00177.
- OpenAI et al. (2019). Solving Rubik's Cube with a Robot Hand. ArXiv, abs/1910.07113.
- OpenAI Blog (2019). Solving Rubik's Cube with a Robot Hand. URL:
- Zheng, G. et al. (2018). DRN: A Deep RL Framework for News Recommendation. In Proceedings of the 2018 World Wide Web Conference (WWW '18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 167–176. DOI:
- Chandrashekar, A. et al. (2017). Artwork Personalization at Netflix. The Netflix Tech Blog. URL:
- McKinney, S. M. et al. (2020). International evaluation of an AI system for breast cancer screening. Nature, 89-94.
- Agrawal, R. (2018, March 8). Microsoft News Center India. Retrieved from