Monte Carlo methods
Using Monte Carlo (MC) methods, we will compute the value functions first and determine the optimal policies. In this method, we do not assume complete knowledge of the environment. MC require only experience, which consists of sample sequences of states, actions, and rewards from actual or simulated interactions with the environment. Learning from actual experiences is striking because it requires no prior knowledge of the environment's dynamics, but still attains optimal behavior. This is very similar to how humans or animals learn from actual experience rather than any mathematical model. Surprisingly, in many cases it is easy to generate experience sampled according to the desired probability distributions, but infeasible to obtain the distributions in explicit form.
Monte Carlo methods solve the reinforcement learning problem based on averaging the sample returns over each episode. This means that we assume experience is divided into episodes, and that all episodes...