In this chapter, we introduced a different class of ML problems, which focus on automating decisions by agents that interact with an environment. We covered the key features they are required to define an RL problem and various solution methods.
We saw how to frame and analyze an RL problem as a finite MDP, and how to compute a solution using value and policy iteration. We then moved on to more realistic situations where the transition probabilities and rewards are unknown to the agent, and saw how Q-learning builds on the key recursive relationship defined by the Bellman optimality equation in the MDP case. We saw how to solve RL problems using Python for simple MDPs and more complex environments with Q-learning.
Finally, we expanded our scope to continuous states and actions and applied the deep Q-learning algorithm to more the complex Lunar Lander environment.
...