MAML in Reinforcement Learning
The algorithm for MAML in the reinforcement learning setting is given as follows:
- Say we have a model f parameterized by a parameter and we have a distribution over tasks p(T). First, we randomly initialize the model parameter .
- Sample a batch of tasks Ti from a distribution of tasks, that is, Ti ~ p(T).
- For each task Ti:
- Sample k trajectories using and prepare the training dataset:
- Train the model on the training dataset and compute the loss
- Minimize the loss using gradient descent and get the optimal parameter as
- Sample k trajectories using and prepare the test dataset:
- Now, we minimize the loss on the test dataset . Parameterize the model f with the optimal parameter calculated in the previous step and compute the loss . Calculate the gradients of the loss and update our randomly initialized parameter using our test (meta-training) dataset:
- Repeat...