Training your agent with Monte Carlo methods
Let's say you would like to learn the chance of flipping heads with a particular, possibly biased, coin:
- One way of calculating this is through a careful analysis of the physical properties of the coin. Although this could give you the precise probability distribution of the outcomes, it is far from being a practical approach.
- Alternatively, you can just flip the coin many times and look at the distribution in your sample. Your estimate could be a bit off if you don't have a large sample, but it will do the job for most practical purposes. The math you need to deal with using the latter method will be incomparably simpler.
Just like in the coin example, we can estimate the state values and action values in an MDP from random samples. Monte Carlo (MC) estimation is a general concept that refers to making estimations through repeated random sampling. In the context of RL, it refers to a collection of methods...