DAgger
The algorithm for DAgger is given as follows:
- Initialize an empty dataset
- Initialize a policy
- For iterations i = 1 to N:
- Create a policy
.
- Generate a trajectory using the policy
.
- Create a dataset
by collecting states visited by the policy
and the actions of those states provided by the expert
. Thus,
.
- Aggregate the dataset as
.
- Train a classifier on the updated dataset
and extract a new policy
.
- Create a policy