Planning through a model
In this section, we first define what it means to plan through a model in the sense of optimal control. Then, we will cover several planning methods, including the cross-entropy method and covariance matrix adaptation evolution strategy. You will also see how these methods can be parallelized using the Ray library. Now, let's get started with the problem definition.
Defining the optimal control problem
In RL, or in control problems in general, we care about the actions an agent takes because there is a task that we want to be achieved. We express this task as a mathematical objective so that we can use mathematical tools to figure out the actions toward the task – and in RL, this is the expected sum of cumulative discounted rewards. You of course know all this, as this is what we have been doing all along, but this is a good time to reiterate it: We are essentially solving an optimization problem here.
Now, let's assume that we are...