The taxonomy of RL methods
The cross-entropy method falls into the model-free, policy-based, and on-policy categories of methods. These notions are new, so let’s spend some time exploring them.
All the methods in RL can be classified into various groups:
-
Model-free or model-based
-
Value-based or policy-based
-
On-policy or off-policy
There are other ways that you can taxonomize RL methods, but, for now, we are interested in the above three. Let’s define them, as the specifics of your problem can influence your choice of a particular method.
The term model-free means that the method doesn’t build a model of the environment or reward; it just directly connects observations to actions (or values that are related to actions). In other words, the agent takes current observations and does some computations on them, and the result is the action that it...