Model imperfections
There is a serious issue with the model-based approach: when our model makes mistakes or is just inaccurate in some regimes of the environment, the policy learned from this model could be totally wrong in real-life situations. To deal with this, we have several options. The most obvious option is to "make the model better." Unfortunately, this can just mean that we'll need more observations from the environment, which is what we've tried to avoid. The more complicated and nonlinear the behavior that the environment has, the worse the situation will be for modelling it properly.
Several ways have been discovered to tackle this issue, for example, the local models family of methods, when we replace one large environment model with a small regime-based set of models and train them using trust-region tricks in the same way that T rust Region Policy Optimization (TRPO) does. Another interesting way of looking at environment models is to augment...