One of the most successful algorithms that learns from demonstrations is Dataset Aggregation (DAgger). This is an iterative policy meta-algorithm that performs well under the distribution of states induced. The most notable feature of DAgger is that it addresses the distribution mismatch by proposing an active method in which the expert teaches the learner how to recover from the learner's mistakes.
A classic IL algorithm learns a classifier that predicts expert behaviors. This means that the model fits a dataset consisting of training examples, observed by an expert. The inputs are the observations, and the actions are the desired output values. However, following the previous reasoning, the predictions of the learner affect the future state or observation visited, violating the i.i.d assumption.
DAgger deals with the change...