Reinforcement learning (RL) agents explore their environments to learn optimal policies by trial and error method. In such environments, one of the critical concerns is the safety of all the agents involved in the experiment. Though, currently, the reinforcement learning agents are mostly executed in simulation, there is a possibility that increased simulation complexities of the real world, will make the safety concerns paramount.
To undertake safe exploration as a critical focus of the reinforcement learning research, a group of OpenAI researchers have proposed a new standardized constrained reinforcement learning (RL) method to incorporate safety specifications into reinforcement learning algorithms to achieve safe exploration.
The major challenge of reinforcement learning is handling the trade-offs between competing objectives, such as task performance and satisfying safety requirements. However, in constrained reinforcement learning, “we don’t have to pick trade-offs—instead, we pick outcomes, and let algorithms figure out the trade-offs that get us the outcomes we want,” states OpenAI. Consequently, the researchers believe “constrained reinforcement learning may turn out to be more useful than normal reinforcement learning for ensuring that agents satisfy safety requirements.”
Read More: OpenAI’s AI robot hand learns to solve a Rubik RLusing Reinforcement learning and Automatic Domain Randomization (ADR)
The field of reinforcement learning has greatly progressed in recent years, however, different implementations use different environments and evaluation procedures. Hence, the researchers believe that there is a deficiency of a standard set of environments for making progress on safe exploration specifically. To this end, the researchers present Safety Gym, a suite of tools for accelerating safe exploration research. Safety Gym is a benchmark suite of 18 high-dimensional continuous control environments for safe exploration, 9 additional environments for debugging task performance separately from safety requirements, and tools for building additional environments.
https://twitter.com/OpenAI/status/1197559989704937473
Safety Gym consists of two components, out of which first is an environment-builder that allows a user to create a new environment by mixing and matching from a wide range of physics elements, goals, and safety requirements. The other component of Safety Gym is a suite of pre-configured benchmark environment to standardize the measure of progress on the safe exploration problem.
It is implemented as a standalone module that uses the OpenAI Gym interface for instantiating and interacting with reinforcement learning environments. It also uses the MuJoCo physics simulator to construct and forward-simulate each environment.
In line with the proposal of standardizing constrained reinforcement learning, each Safety Gym environment provides a separate objective for task performance and safety. These objectives are conveyed via a reward function and a set of auxiliary cost functions respectively.
In all Safety Gym environments, an agent perceives the surrounding through a robot’s sensors and interacts with the world through its actuators. It is shipped with three pre-made robots.
Image source: Research paper
These three environment-builders currently support three main tasks of Goal, Button and Push. All the tasks in Safety Gym are mutually exclusive and can work on only one task at a time. It supports five main kinds of elements relevant to safety requirements like Hazards (dangerous areas to avoid), Vases (Objects to avoid), Pillars (Immobile obstacles), Buttons (Incorrect goals), and Gremlins (Moving objects). All the types of constraint elements pose different challenges for the agent to avoid.
After conducting experiments on the unconstrained and constrained reinforcement learning algorithms on the constrained Safety Gym environments, the researchers found that the unconstrained reinforcement learning algorithms are able to score high returns by taking unsafe actions, as measured by the cost function. On the other hand, the constrained reinforcement learning algorithms attain lower levels of return, and correspondingly maintain desired levels of costs.
Also, they found that the standard reinforcement learning is able to control the Doggo robot and can acquire complex locomotion behavior, as indicated by high returns in the environments when trained without constraints. However,despite the success of constrained reinforcement learning when locomotion requirements are absent, and the success of standard reinforcement learning when locomotion is needed, the constrained reinforcement learning algorithms struggled to learn the safe locomotion policies. The researchers also state that additional research is needed to develop constrained reinforcement learning algorithms that can solve more challenging tasks.
Thus the OpenAI researchers propose a standardized constrained reinforcement learning as the main formalism for safe exploration. They also introduce Safety Gym which is the first benchmark of high-dimensional continuous control environments for evaluating the performance of constrained reinforcement learning algorithms. The researchers have also evaluated baseline unconstrained and constrained reinforcement learning algorithms on Safety Gym environments to clarify the current state of the art in safe exploration.
Many have appreciated Safety Gym’s feature of prioritizing ‘safety’ first in AI.
https://twitter.com/gicorit/status/1197594242715131904
https://twitter.com/tupjarsakiv/status/1197597397918126085
Interested reader can read the research paper for more information on Safety Gym.
Open AI researchers advance multi-agent competition by training AI agents in a simple hide and seek environment
What does a data science team look like?
NVIDIA releases Kaolin, a PyTorch library to accelerate research in 3D computer vision and AI
Baidu adds Paddle Lite 2.0, new development kits, EasyDL Pro, and other upgrades to its PaddlePaddle deep learning platform
LG introduces Auptimizer, an open-source ML model optimization tool for efficient hyperparameter tuning at scale