Actor-critic methods
Approaches to reinforcement learning can be divided into three broad categories:
Value-based learning: This tries to learn the expected reward/value for being in a state. The desirability of getting into different states can then be evaluated based on their relative value. Q-learning in an example of value-based learning.
Policy-based learning: In this, no attempt is made to evaluate the state, but different control policies are tried out and evaluated based on the actual reward from the environment. Policy gradients are an example of that.
Model-based learning: In this approach, which will be discussed in more detail later in the chapter, the agent attempts to model the behavior of the environment and choose an action based on its ability to simulate the result of actions it might take by evaluating its model.
Actor-critic methods all revolve around the idea of using two neural networks for training. The first, the critic, uses value-based learning to learn a value function...