Imagination-augmented agent
The overall idea of the new architecture called imagination-augmented agent (I2A) is to allow the agent to imagine future trajectories by the current observations and incorporate these imagined paths into its decision process. The high-level architecture is shown in the following diagram:
The agent consists of two different paths used to transform the input observation: model-free and imagination. Model-free is a standard set of convolution layers transforming the input image in high-level features. Another path is called imagination and consists of a set of trajectories "imagined" from the current observation. The trajectories are called rollouts and are produced for every available action in the environment. Every rollout consists of a fixed number of steps into the future and on every step a special model, called the Environment Model (EM), (but not to be confused with the expectation maximization method), produces the...