Earlier this month, some students of Robotics Institute Carnegie Mellon University published a paper proposing a learning approach called competitive reinforcement learning using visual transfer. This method, with the help of asynchronous advantage actor critic (A3C) architecture, generalizes a target game using an agent trained on a source game in Atari.
The A3C architecture is an asynchronous variant of the actor-critic model, in which the actor takes in the current environment state and determines the best action to take from there. It consists of four convolutional layers, an LSTM layer, and two fully connected layers to predict actions and value functions of the states.
In this architecture, multiple worker agents are trained in parallel, each with their own copy of the model and environment. Advantage here refers to a metric that is set to judge how good the agents' actions were.
The learning approach introduced in this paper aims to use a reinforcement agent to generalize between two related but vastly different Atari games like Pong-v0 and Breakout-v0. This is done by learning visual mappers: given a frame from the source game, we should be able to generate the analogous frame in the target game.
In both these games, a paddle is controlled to hit a ball to obtain a certain objective. Using this method the six actions of Pong-v0 {No Operation, Fire, Right, Left, Right Fire, Left Fire} are mapped to the four actions of Breakout-v0 as {Fire, Fire, Right, Left, Right, Left} respectively. The rewards are mapped directly from source game to target game without any scaling. The source and target environment they experimented on was OpenAI gym.
They found underlying similarities between the source and the target game to represent common knowledge using Unsupervised Image-to-image Translation (UNIT) Generative adversarial networks (GANs). The target game competes with its visual representation obtained after using the UNIT GAN as a visual mapper between the source and target game.
The following diagram depicts how knowledge is transferred from source game to target game by competitively and simultaneously fine-tuning the model using two different visual representations of the target game:
To read more about this learning approach and its efficiency, check out this research paper published by Akshita Mittel, Purna Sowmya Munukutla, and Himanshi Yadav: Visual Transfer between Atari Games using Competitive Reinforcement Learning.
“Deep meta reinforcement learning will be the future of AI where we will be so close to achieving artificial general intelligence (AGI)”, Sudharsan Ravichandiran
This self-driving car can drive in its imagination using deep reinforcement learning
Dopamine: A Tensorflow-based framework for flexible and reproducible Reinforcement Learning research by Google