- Unlike MAML, in Meta-SGD, along with finding optimal parameter value, , we also find the optimal learning rate, , and update the direction.
- The learning rate is implicitly implemented in the adaptation term. So, in Meta-SGD, we don't initialize a learning rate with a small scalar value. Instead, we initialize them with random values with the same shape as and learn them along with .
- The update equation of the learning rate can be expressed as .
- Sample n tasks and run SGD for fewer iterations on each of the sampled tasks, and then update our model parameter in a direction that is common to all the tasks.
- The reptile update equation can be expressed as .