The imagination-augmented agent
The overall idea of the new architecture, called imagination-augmented agent (I2A), is to allow the agent to imagine future trajectories using the current observations and incorporate these imagined paths into its decision process. The high-level architecture is shown in the following diagram:
Figure 22.1: The I2A architecture
The agent consists of two different paths used to transform the input observation: model-free and imagination. Model-free is a standard set of convolution layers that transforms the input image in high-level features. The other path, imagination, consists of a set of trajectories imagined from the current observation. The trajectories are called rollouts and they are produced for every available action in the environment. Every rollout consists of a fixed number of steps into the future, and on every step, a special model, called the environment model (EM) (but not to be confused with the expectation maximization method...