Using Meta-Learning in Nonstationary and Competitive Environments with Pieter Abbeel et al

This ICLR 2018 accepted paper, Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments, addresses the use of meta-learning to operate in non-stationary environments, represented as a Markov chain of distinct tasks. This paper is authored by Pieter Abbeel, Maruan Al-Shedivat, Trapit Bansal, Yura Burda, Ilya Sutskever, and Igor Mordatch.

Pieter Abbeel is a professor at UC Berkeley since 2008. He was also a Research Scientist at OpenAI (2016-2017). His current research focuses on robotics and machine learning with particular focus on meta-learning and deep reinforcement learning. One of the other authors of this paper, Ilya Sutskever is the co-founder and Research Director of OpenAI. He was also a Research Scientist at the Google Brain Team for 3 years.

Meta-Learning, or alternatively learning to learn, typically uses metadata to understand how automatic learning can become flexible in solving learning problems, i.e. to learn the learning algorithm itself. Continuous adaptation in real-world environments is quite essential for any learning agent and meta-learning approach is an appropriate choice for this task. This article will talk about one of the top accepted research papers in the field of meta-learning at the 6th annual ICLR conference scheduled to happen between April 30 - May 03, 2018.

Using a gradient-based meta-learning algorithm for Nonstationary Environments

What problem is the paper attempting to solve?

Reinforcement Learning algorithms, although achieving impressive results ranging from playing games to applications in dialogue systems to robotics, are only limited to solving tasks in stationary environments. On the other hand, the real-world is often nonstationary either due to complexity, changes in the dynamics in the environment over the lifetime of a system, or presence of multiple learning actors. Nonstationarity breaks the standard assumptions and requires agents to continuously adapt, both at training and execution time, in order to succeed. The classical approaches to dealing with nonstationarity are usually based on context detection and tracking i.e., reacting to the already happened changes in the environment by continuously fine-tuning the policy.

However, nonstationarity allows only for limited interaction before the properties of the environment change. Thus, it immediately puts learning into the few-shot regime and often renders simple fine-tuning methods impractical.

In order to continuously learn and adapt from limited experience in nonstationary environments, the authors of this paper propose the learning-to-learn (or meta-learning) approach.

Paper summary

This paper proposes a gradient-based meta-learning algorithm suitable for continuous adaptation of RL agents in nonstationary environments. The agents meta-learn to anticipate the changes in the environment and update their policies accordingly. This method builds upon the previous work on gradient-based model-agnostic meta-learning (MAML) that has been shown successful in the few shot settings. Their algorithm re-derive MAML for multi-task reinforcement learning from a probabilistic perspective, and then extends it to dynamically changing tasks.

This paper also considers the problem of continuous adaptation to a learning opponent in a competitive multi-agent setting and have designed RoboSumo—a 3D environment with simulated physics that allows pairs of agents to compete against each other.

The paper answers the following questions:

What is the behavior of different adaptation methods (in nonstationary locomotion and competitive multi-agent environments) when the interaction with the environment is strictly limited to one or very few episodes before it changes?
What is the sample complexity of different methods, i.e., how many episodes are required for a method to successfully adapt to the changes?

Additionally, it answers the following questions specific to the competitive multi-agent setting:

Given a diverse population of agents that have been trained under the same curriculum, how do different adaptation methods rank in a competition versus each other?
When the population of agents is evolved for several generations, what happens with the proportions of different agents in the population?

Key Takeaways

This work proposes a simple gradient-based meta-learning approach suitable for continuous adaptation in nonstationary environments. This method was applied to nonstationary locomotion and within a competitive multi-agent setting—the RoboSumo environment.
The key idea of the method is to regard nonstationarity as a sequence of stationary tasks and train agents to exploit the dependencies between consecutive tasks such that they can handle similar nonstationarities at execution time.
In both cases, i.e meta-learning algorithm and the multi-agent setting, meta-learned adaptation rules were more efficient than the baselines in the few-shot regime. Additionally, agents that meta-learned to adapt, demonstrated the highest level of skill when competing in iterated games against each other.

Reviewer feedback summary

Overall Score: 24/30
Average Score: 8

The paper was termed as a great contribution to ICLR. According to the reviewers, the paper addressed a very important problem for general AI and was well-written. They also appreciated the careful experiment designs, and thorough comparisons making the results convincing. They found that editorial rigor and image quality could be better. However, there was no content related improvements suggested. The paper was appreciated for being dense and rich on rapid meta-learning.