MAML in Reinforcement Learning
The algorithm for MAML in the reinforcement learning setting is given as follows:
- Say we have a model f parameterized by a parameter
and we have a distribution over tasks p(T). First, we randomly initialize the model parameter
.
- Sample a batch of tasks Ti from a distribution of tasks, that is, Ti ~ p(T).
- For each task Ti:
- Sample k trajectories using
and prepare the training dataset:
- Train the model
on the training dataset
and compute the loss
- Minimize the loss using gradient descent and get the optimal parameter
as
- Sample k trajectories using
and prepare the test dataset:
- Sample k trajectories using
- Now, we minimize the loss on the test dataset
. Parameterize the model f with the optimal parameter
calculated in the previous step and compute the loss
. Calculate the gradients of the loss and update our randomly initialized parameter
using our test (meta-training) dataset:
- Repeat...