Paper in Two minutes: Zero-Shot learning for Visual Imitation

The ICLR paper, ‘Zero-Shot learning for Visual Imitation’ is a collaborative effort by Deepak Pathak, Parsa Mahmoudieh, Michael Luo, Pulkit Agrawal, Dian Chen, Fred Shentu, Evan Shelhamer, Jitendra Malik, Alexei A. Efros, and Trevor Darrell.

In this article, we will come across one of the main problems with imitation learning, the expense of expert demonstration. The authors here propose a method for sidestepping this issue by using the random exploration of an agent to learn generalizable skills which can then be applied without any specific pretraining on any new task.

Reducing the expert demonstration expense with Zero-shot visual imitation

What problem is the paper trying to solve?

In order to carry out imitation, the expert should be able to simply demonstrate tasks capably without lots of effort, instrumentation, or engineering. Collecting too many demonstrations is time-consuming, exact state-action knowledge is impractical, and reward design is involved and takes more than task expertise. The agent should be able to achieve goals based on the demonstrations without having to devote time learning to do each and every task. To address these issues, the authors recast learning from demonstration into doing from demonstration by

(1) Only giving demonstrations during inference and,

(2) Restricting demonstrations to visual observations alone rather than full state-actions.

Instead of imitation learning, the agent must learn to imitate. This is the goal that the authors are trying to achieve.

Paper summary

This paper explains how existing approaches to imitation learning distill both what to do (goal) and how to do it (skills), from expert demonstrations. However, this expertise is effective but expensive supervision: it is not always practical to collect many detailed demonstrations. The authors here suggest that if an agent has access to its environment along with the expert, it can learn skills from its own experience and rely on expertise for the goals alone. And so, they have proposed a ‘Zero-shot’ method which does not include any expert actions or demonstrations during learning. The zero-shot imitator has no prior knowledge of the environment and makes no use of the expert during training. It learns from experience to follow experts, for instance, the authors conducted certain experiments such as, navigating an office with a turtlebot, and manipulating rope with a baxter robot.

Key takeaways

The authors have proposed a method for learning a parametric skill function (PSF) that takes as input a description of the initial state, goal state, parameters of the skill and outputs a sequence of actions (could be of varying length), which take the agent from initial state to goal state.

The authors have shown real-world results for office navigation and rope manipulation but make no domain assumptions limiting the method to these problems.

Zero-shot imitators learn to follow demonstrations without any expert supervision during learning. This approach learns task priors of representation, goals, and skills from the environment in order to imitate the goals given by the expert during inference.

Reviewer comments summary

Overall Score: 25/30

Average Score: 8

As per one of the reviewers, the proposed approach is well founded and the experimental evaluations are promising. The paper is well written and easy to follow. The skill function uses a RNN as function approximator and minimizes the sum of two losses i.e. the state mismatch loss over the trajectory (using an explicitly learnt forward model) and the action mismatch loss (using a model-free action prediction module) . This is hard to do in practice due to jointly learning both the forward model as well as the state mismatches. So first they are separately learnt and then fine-tuned together.