The Workings of Monte Carlo Methods
Monte Carlo methods solve reinforcement problems by averaging the sample returns for each state-action pair. Monte Carlo methods work only for episodic tasks. This means the experience is split into various episodes and all episodes finally terminate. Only after the episode is complete are the value functions recalculated. Monte Carlo methods can be incrementally optimized episode by episode but not step by step.
Let's take the example of a game like Go. This game has millions of states; it is going to be difficult to learn all of those millions of states and their transition probabilities beforehand. The other approach would be to play the game of Go repeatedly and assign a positive reward for winning and a negative reward for losing.
As we don't have information about the policy of the model, we need to use experience samples to learn. This technique is also a sample-based model. We call this direct sampling of episodes in Monte...