Deep Q learning from demonstrations
We learned that in imitation learning, we try to learn from expert demonstrations. Can we make use of expert demonstrations in DQN and perform better? Yes! In this section, we will learn how to make use of expert demonstrations in DQN using an algorithm called DQfD.
In the previous chapters, we have learned about several types of DQN. We started off with vanilla DQN, and then we explored various improvements to the DQN, such as double DQN, dueling DQN, prioritized experience replay, and more. In all these methods, the agent tries to learn from scratch by interacting with the environment. The agent interacts with the environment and stores their interaction experience in a buffer called a replay buffer and learns based on their experience.
In order for the agent to perform better, it has to gather a lot of experience from the environment, add it to the replay buffer, and train itself. However, this method costs us a lot of training...