Focusing on generalization in reinforcement learning
The core goal in most machine learning projects is to obtain models that will work beyond training, and under a broad set of conditions during test time. Yet, when you start learning about RL, efforts to prevent overfitting and achieve generalization are not always at the forefront of the discussion, as opposed to how it is with supervised learning. In this section, we discuss what leads to this discrepancy, describe how generalization is closely related to partial observability in RL, and present a general recipe to handle these challenges.
Generalization and overfitting in supervised learning
When we train an image recognition or forecasting model, what we really want to achieve is high accuracy on unseen data. After all, we already know the labels for the data at hand. We use various methods to this end:
- We use separate training, dev, and test sets, for model training, hyperparameter selection, and model performance...