For this chapter, we will jump back to the trial-and-error thread of reinforcement learning (RL) and look at Monte Carlo methods. This is a class of methods that works by episodically playing through an environment instead of planning. We will see how this improves our RL search for the best policy and we now start to think of our algorithm as an actual agent—one that explores the game environment rather than preplans a policy, which, in turn, allows us to understand the benefits of using a model for planning or not. From there, we will look at the Monte Carlo method and how to implement it in code. Then, we will revisit a larger version of the FrozenLake environment with our new Monte Carlo agent algorithm.
In this chapter, we will continue looking at how RL has evolved and, in particular, focus on the trial and error thread with the Monte Carlo method...