Is the MC method applicable to all tasks?
We learned that Monte Carlo is a model-free method, and so it doesn't require the model dynamics of the environment to compute the value and Q function in order to find the optimal policy. The Monte Carlo method computes the value function and Q function by just taking the average return of the state and the average return of the state-action pair, respectively.
But one issue with the Monte Carlo method is that it is applicable only to episodic tasks. We learned that in the Monte Carlo method, we compute the value of the state by taking the average return of the state and the return is the sum of rewards of the episode. But when there is no episode, that is, if our task is a continuous task (non-episodic task), then we cannot apply the Monte Carlo method.
Okay, how do we compute the value of the state where we have a continuous task and also where we don't know the model dynamics of the environment? Here is where...