Offline reinforcement learning
Offline reinforcement learning is about training agents using data recorded during some prior interactions of an agent (likely non-RL, such as a human agent) with the environment, as opposed to directly interacting with it. It is also called batch reinforcement learning. In this section, we look into some of the key components of offline RL. Let's get started with an overview of how it works.
An overview of how offline reinforcement learning works
In offline RL, the agent does not directly interact with the environment to explore and learn a policy. Figure 13.12 contrasts this to on-policy and off-policy settings.
Let's unpack what this figure illustrates:
- In on-policy RL, the agent collects a batch of experiences with each policy. Then, it uses this batch to update the policy. This cycle repeats until...