DAgger
The algorithm for DAgger is given as follows:
- Initialize an empty dataset
- Initialize a policy
- For iterations i = 1 to N:
- Create a policy .
- Generate a trajectory using the policy .
- Create a dataset by collecting states visited by the policy and the actions of those states provided by the expert . Thus, .
- Aggregate the dataset as .
- Train a classifier on the updated dataset and extract a new policy .