In this section, we will cover what Actor-Critic algorithms are. You will also see what policy gradients are and how they are useful to Actor-Critic algorithms.
How do students learn at school? Students normally make a lot of mistakes as they learn. When they do well at learning a task, their teacher provides positive feedback. On the other hand, if students do poorly at a task, the teacher provides negative feedback. This feedback serves as the learning signal for the student to get better at their tasks. This is the crux of Actor-Critic algorithms.
The following is a summary of the steps involved:
- We will have two neural networks—one referred to as the actor, and the other as the critic
- The actor is like the student, as we described previously, and takes an action at a given state
- The critic is like the teacher, as we described...