Chapter 4 – Monte Carlo Methods
- In the Monte Carlo method, we approximate the value of a state by taking the average return of a state across N episodes instead of taking the expected return.
- To compute the value function using the dynamic programming method, we need to know the model dynamics, and when we don't know the model dynamics, we use model-free methods. The Monte Carlo method is a model-free method meaning that it doesn't require the model dynamics (transition probability) to compute the value function.
- In a prediction task, we evaluate the given policy by predicting the value function or Q function, which helps us to understand the expected return an agent would get if it used the given policy. However, in a control task, our goal is to find the optimal policy and are not given any policy as input, so we start by initializing a random policy and try to find the optimal policy iteratively.
- In the MC prediction method, the value...