Pieter Abbeel is a professor at UC Berkeley since 2008. He was also a Research Scientist at OpenAI (2016-2017). His current research focuses on robotics and machine learning with particular focus on meta-learning and deep reinforcement learning. One of the other authors of this paper, Ilya Sutskever is the co-founder and Research Director of OpenAI. He was also a Research Scientist at the Google Brain Team for 3 years.
Meta-Learning, or alternatively learning to learn, typically uses metadata to understand how automatic learning can become flexible in solving learning problems, i.e. to learn the learning algorithm itself. Continuous adaptation in real-world environments is quite essential for any learning agent and meta-learning approach is an appropriate choice for this task. This article will talk about one of the top accepted research papers in the field of meta-learning at the 6th annual ICLR conference scheduled to happen between April 30 - May 03, 2018.
Reinforcement Learning algorithms, although achieving impressive results ranging from playing games to applications in dialogue systems to robotics, are only limited to solving tasks in stationary environments. On the other hand, the real-world is often nonstationary either due to complexity, changes in the dynamics in the environment over the lifetime of a system, or presence of multiple learning actors. Nonstationarity breaks the standard assumptions and requires agents to continuously adapt, both at training and execution time, in order to succeed. The classical approaches to dealing with nonstationarity are usually based on context detection and tracking i.e., reacting to the already happened changes in the environment by continuously fine-tuning the policy.
However, nonstationarity allows only for limited interaction before the properties of the environment change. Thus, it immediately puts learning into the few-shot regime and often renders simple fine-tuning methods impractical.
In order to continuously learn and adapt from limited experience in nonstationary environments, the authors of this paper propose the learning-to-learn (or meta-learning) approach.
This paper proposes a gradient-based meta-learning algorithm suitable for continuous adaptation of RL agents in nonstationary environments. The agents meta-learn to anticipate the changes in the environment and update their policies accordingly. This method builds upon the previous work on gradient-based model-agnostic meta-learning (MAML) that has been shown successful in the few shot settings. Their algorithm re-derive MAML for multi-task reinforcement learning from a probabilistic perspective, and then extends it to dynamically changing tasks.
This paper also considers the problem of continuous adaptation to a learning opponent in a competitive multi-agent setting and have designed RoboSumo—a 3D environment with simulated physics that allows pairs of agents to compete against each other.
The paper answers the following questions:
Additionally, it answers the following questions specific to the competitive multi-agent setting:
Overall Score: 24/30
Average Score: 8
The paper was termed as a great contribution to ICLR. According to the reviewers, the paper addressed a very important problem for general AI and was well-written. They also appreciated the careful experiment designs, and thorough comparisons making the results convincing. They found that editorial rigor and image quality could be better. However, there was no content related improvements suggested. The paper was appreciated for being dense and rich on rapid meta-learning.