Implementing the RL agent’s runtime components
We have looked at several agent algorithm implementations in the previous chapters. You may have noticed from recipes in the previous chapters (especially Chapter 3, Implementing Advanced Deep RL Algorithms), where we implemented RL agent training code, that some parts of the agent code were conditionally executed. For example, the experience replay routine was only run when a certain condition (such as the number of samples in the replay memory) was met, and so on. That begs the question: what are the essential components in an agent that is required, especially when we do not aim to train it further and only execute a learned policy?
This recipe will help you distill the implementation of the Soft Actor-Critic (SAC) agent down to the minimal set of components – those that are absolutely necessary for the runtime of your agent.
Let’s get started!
Getting ready
To complete this recipe, you will first need...