Importance Sampling
Monte Carlo methods can be on-policy or off-policy. In on-policy learning, we learn from the agent experience of the following policy. In off-policy learning, we learn how to estimate a target policy from the experience of following a different behavioral policy. Importance sampling is a key technique for off-policy learning. The following figure compares on-policy and off-policy learning:
You might think that on-policy learning is learning while playing, while off-policy learning is learning by watching someone else play. You could improve your cricket game by playing cricket yourself. This will help you learn from your mistakes and best actions. That would be on-policy learning. You could also learn by watching others play the game of cricket and learning from their mistakes and best actions. That would be off-policy learning.
Human beings typically do both on-policy and off-policy...