Chapter 15 – Imitation Learning and Inverse RL
- One of the simplest and most naive ways to perform imitation learning is by treating an imitation learning task as a supervised learning task. First, we collect a set of expert demonstrations, then we train a classifier to perform the same action performed by the expert in a particular state. We can view this as a big multiclass classification problem and train our agent to perform the action performed by the expert in the respective state.
- In DAgger, we aggregate the dataset over a series of iterations and train the classifier on the aggregated dataset.
- In DQfD, we fill the replay buffer with expert demonstrations and pre-train the agent. Note that these expert demonstrations are used only for pretraining the agent. Once the agent is pre-trained, the agent will interact with the environment and gather more experience and make use of it for learning. Thus DQfD consists of two phases, which are pre-training and...