- Unlike MAML, in Meta-SGD, along with finding optimal parameter value,
, we also find the optimal learning rate,
, and update the direction.
- The learning rate is implicitly implemented in the adaptation term. So, in Meta-SGD, we don't initialize a learning rate with a small scalar value. Instead, we initialize them with random values with the same shape as
and learn them along with
.
- The update equation of the learning rate can be expressed as
.
- Sample n tasks and run SGD for fewer iterations on each of the sampled tasks, and then update our model parameter in a direction that is common to all the tasks.
- The reptile update equation can be expressed as
.