DAgger
DAgger is one of the most-used imitation learning algorithms. Let's understand how DAgger works with an example. Let's revisit our example of training an agent to drive a car. First, we initialize an empty dataset .
In the first iteration, we start off with some policy to drive the car. Thus, we generate a trajectory
using the policy
. We know that the trajectory consists of a sequence of states and actions—that is, states visited by our policy
and actions made in those states using our policy
. Now, we create a new dataset
by taking only the states visited by our policy
and we use an expert to provide the actions for those states. That is, we take all the states from the trajectory and ask the expert to provide actions for those states.
Now, we combine the new dataset with our initialized empty dataset
and update
as:
![](https://static.packt-cdn.com/products/9781839210686/graphics/Images/B15558_15_017.png)
Next, we train a classifier on this updated dataset and learn a new policy
.
In the second iteration, we use the...