If you recall from our very first chapter, Chapter 1, Understanding Rewards-Based Learning, we explored the primary elements of RL. We learned that RL comprises of a policy, a value function, a reward function, and, optionally, a model. We use the word model in this context to refer to a detailed plan of the environment. Going back to the last chapter again, where we used the FrozenLake environment, we had a perfect model of that environment:
Model of the FrozenLake environment
Of course, looking at problems with a fully described model in a finite MDP is all well and good for learning. However, when it comes to the real world, having a full and completely understood model of any environment would likely be highly improbable, if not impossible. This is because there are far too many states to account for or model in any real...