Open AI researchers have built a simple hide and seek game environment for multi-agent competition where they observed that AI agents can learn complex strategies and skills on their own as the game progresses. In fact, these AI agents built six distinct strategies and counterstrategies, some of which were not even supported by the training environment. The researchers concluded that such multi-agent co-adaptation may one day produce extremely complex and intelligent behavior.
AI agents play a team-based hide-and-seek game in a physics-based environment. Hiders (blue) avoid the line of sight from the seekers, and the seekers keep the vision of the hiders. The environment has various objects (walls, ramps, blocks) that agents can grab and also lock in place. There are also randomly generated immovable rooms and walls that the agents must learn to navigate. Before the game, hiders are given a preparation time to run away or change their environment and the seekers are immobilized.
Agents are given a team-based reward; hiders are given a reward of +1 if all hiders are hidden and -1 if any hider is seen by a seeker. Seekers are given the opposite reward, -1 if all hiders are hidden and +1 otherwise. There are no explicit incentives for agents to interact with objects in the environment; they are penalized if they go too far outside the play area.
Agents are trained using self-play and agent policies are composed of two separate networks with different parameters. This includes a policy network that produces an action distribution and a critic network that predicts the discounted future returns. Policies are optimized using Proximal Policy Optimization (PPO) and Generalized Advantage Estimation (GAE); training is performed using OpenAI’s rapid, it’s general-purpose RL training system.
The researchers noticed that as agents train against each other in hide-and-seek, six distinct strategies emerge.
https://youtu.be/kopoLzvh5jY
They also found some surprising behavior by these AI agents.
The researchers concluded that complex human-relevant strategies and skills can emerge from multi-agent competition and standard reinforcement learning algorithms at scale. They state, “our results with hide-and-seek should be viewed as a proof of concept showing that multi-agent auto-curricula can lead to physically grounded and human-relevant behavior.”
This research was well appreciated by readers. Many people took to Hacker News to congratulate the researchers. Here are a few comments.
“Amazing. Very cool to see this sort of multi-agent emergent behavior. Along with the videos, I can't help but get a very 'Portal' vibe from it all. "Thank you for helping us help you help us all."
“This is incredible. The various emergent behaviors are fascinating. It seems that OpenAI has a great little game simulated for their agents to play in. The next step to make this even cooler would be to use physical, robotic agents learning to overcome challenges in real meatspace!”
“I'm completely amazed by that. The hint of a simulated world seems so matrix-like as well, imagine some intelligent thing evolving out of that. Wow.”
Read the research paper for a deeper analysis. The code is available on GitHub.
Google researchers present Weight Agnostic Neural Networks (WANNs) that perform tasks without learning weight parameters
DeepMind introduces OpenSpiel, a reinforcement learning-based framework for video games
Google open sources an on-device, real-time hand gesture recognition algorithm built with MediaPipe