In this chapter, we covered quite a lot. Not only did we explore a whole new branch of machine learning, that is, reinforcement learning, we also implemented some state-of-the-art algorithms that have shown to give rise to complex autonomous agents. We saw how we can model an environment using the Markov decision process and assess optimal rewards using the Bellman equation. We also saw how problems of credit assignment can be addressed by approximating a quality function using deep neural networks. While doing so, we explored a whole bag of tricks like reward discounting, clipping, and experience replay memory (to name a few) that contribute toward representing high dimensional inputs like game screen images to navigate simulated environments while optimizing a goal.
Finally, we explored some of the advances in the fiend of deep-Q learning, overviewing architectures like...