In this chapter, we extended our exploration of RL and looked again at trial-and-error methods. In particular, we focused on how the Monte Carlo method could be used as a way of learning from experimenting. We first looked at an example experiment of the Monte Carlo method for calculating π. From there, we looked at how to visualize the output of this experiment with matplotlib. Then, we looked at a code example that showed how to use the Monte Carlo method to solve a version of the FrozenLake problem. Exploring the code example in detail, we uncovered how the agent played the game and, through that exploration, learned to improve a policy. Finally, we finished this chapter by understanding how the agent improves this policy using an incremental sample mean.
The Monte Carlo method is powerful but, as we learned, it requires episodic gameplay while, in the real world...