Summary
In this chapter, we explored a new learning technique called Reinforcement Learning. We saw how this was different from traditional supervised and unsupervised learning techniques. The goal of Reinforcement Learning is decision making and at the heart of it is MDP. We explored the elements of MDP and learned about it using an example. We then covered some fundamental Reinforcement Learning techniques that are on-policy and off-policy, and some of them are indirect and direct methods of learning. We covered dynamic programming (DP) methods, Monte Carlo methods, and some key temporal difference (TD) methods like Q-learning, Sarsa, R-learning, and actor-critic methods. Finally, we had hands-on implementations for some of these algorithms using our standard technology stack identified for this book. In the next chapter, we will cover ensemble learning methods.