Summary
Monte Carlo methods learn from experience in the form of sample episodes. Without having a model of the environment, by interacting with the environment, the agent can learn a policy. In several cases of simulation or sampling, an episode is feasible. We learned about the first visit and every visit evaluation. Also, we learned about the balance between exploration and exploitation. This is achieved by having an epsilon soft policy. We then learned about on-policy and off-policy learnings, and how importance sampling plays a key role in off-policy methods. We learned about the Monte Carlo methods by applying them to Blackjack and the Frozen Lake environment available in the OpenAI framework.
In the next chapter, we will learn about temporal learning and its applications. Temporal learning combines the best of dynamic programming and the Monte Carlo methods. It can work where the model is not known, like the Monte Carlo methods, but can provide incremental learning instead...