Summary
We started the chapter by understanding what imitation learning is and how supervised imitation learning works. Next, we learned about the DAgger algorithm, where we aggregate the dataset obtained over a series of iterations and learn the optimal policy.
After looking at DAgger, we learned about DQfD, where we prefill the replay buffer with expert demonstrations and pre-train the agent with expert demonstrations before the training phase.
Moving on, we learned about IRL. We understood that in reinforcement learning, we try to find the optimal policy given the reward function, but in IRL, we try to learn the reward function given the expert demonstrations. When we have derived the reward function from the expert demonstrations using IRL, we can use the reward function to train our agent to learn the optimal policy using any reinforcement learning algorithm. We then explored how to learn the reward function using the maximum entropy IRL algorithm.
At the end...