Now that we understand the Monte Carlo method, we need to understand how to apply it to RL. Recall that our expectation now is that our environment is relatively unknown, that is, we do not have a model. Instead, we now need to develop an algorithm by which to explore the environment by trial and error. Then, we can take all of those various trials and, by using Monte Carlo, average them out and determine a best or better policy. We can then use that improved policy to continue exploring the environment for further improvements. Essentially, our algorithm becomes an explorer rather than a planner and this is why we now refer to it as an agent.
Using the term agent reminds us that our algorithm is now an explorer and learner. Hence, our agents not only explore but also learn from that exploration and improve on it. Now, this is real artificial intelligence.
Aside from...