In this chapter, we took a break from reinforcement learning algorithms and explored a new type of learning called imitation learning. The novelty of this new paradigm lies in the way in which the learning takes place; that is, the resulting policy imitates the behavior of an expert. This paradigm differentiates from reinforcement learning in the absence of a reward signal and in its ability to leverage the incredible source of information brought by the expert entity.
We saw that the dataset from which the learner learns can be expanded with additional state action pairs to increase the confidence of the learner in new situations. This process is called data aggregation. Moreover, new data could come from the new learned policy and, in this case, we talked about on-policy data (as it comes from the same policy learned). This integration of on-policy states with expert...