Questions
Let's put our knowledge of actor-critic methods to the test. Try answering the following questions:
- What is the role of actor and critic networks in DDPG?
- How does the critic in DDPG work?
- What are the key features of TD3?
- Why do we need clipped double Q learning?
- What is target policy smoothing?
- What is maximum entropy reinforcement learning?
- What is the role of the critic network in SAC?