Enriching agent experience via domain randomization
DR is simply about randomizing the parameters defining (part of) the environment during training to enrich the training data. It is a useful technique to obtain policies that are robust and generalizable, both in fully and partially observable environments. In this section, we first present a classification of such parameters, in other words, different dimensions of randomization. Then, we discuss two curriculum learning approaches to guide RL training along those dimensions.
Dimensions of randomization
Borrowed from (Rivlin, 2019), a useful categorization of how two environments belonging to the same problem class (e.g., autonomous driving) can differ is as follows.
Different observations for the same/similar states
In this case, two environments emit different observations although the underlying state and transition functions are the same or very similar. An example to this is the same Atari game scene but with different...