Quantifying generalization via CoinRun
There are various ways of testing whether certain algorithms/approaches generalize to unseen environment conditions better than others, such as:
- Creating validation and test environments with separate sets of environment parameters,
- Assessing policy performance in real-life deployment.
Real-life deployment may not necessarily be an option, so the latter is not always practical. The challenge with the former is to have consistency and to ensure that validation/test data are indeed not used in training. Also, it is possible to overfit to the validation environment when too many models are tried based on validation performance. One approach to overcome these challenges is to use procedurally generated environments. To this end, OpenAI has created the CoinRun environment to benchmark algorithms on their generalization capabilities. Let's look into it in more detail.
CoinRun environment
In the CoinRun environment, we have...