Questions
Let's evaluate our understanding of the policy gradient method by answering the following questions:
- What is a value-based method?
- Why do we need a policy-based method?
- How does the policy gradient method work?
- How do we compute the gradient in the policy gradient method?
- What is a reward-to-go?
- What is the policy gradient with the baseline function?
- Define the baseline function.