Introduction
In the previous chapter, we learned about dynamic programming. Dynamic programming is a way of doing reinforcement learning where the model of the environment is known beforehand. Agents in reinforcement learning can learn a policy, value function, and/or model. Dynamic programming helps solve a known Markov Decision Process (MDP). The probabilistic distribution for all possible transitions is known in an MDP and is required for dynamic programming.
But what happens when the model of the environment is not known? In many real-life situations, the model of the environment is not known beforehand. Can the algorithm learn the model of the environment? Can the agents in reinforcement learning still learn to make good decisions?
Monte Carlo methods are a way of learning when the model of the environment is not known and so they are called model-free learning. We can make a model-free prediction that estimates the value function of an unknown MDP. We can also use model...