In this section, we will cover the pros and cons of policy optimization methods over value-based methods. The advantages are as follows:
- They provides better convergence.
- They are highly effective in case of high-dimensional/continuous state-action spaces. If action spaces are very big then a max function in a value-based method will be computationally expensive. So, the policy-based method directly changes the policy by changing the parameters instead of solving the max function at each step.
- Ability to learn stochastic policies.
The disadvantages associated with policy-based methods are as follows:
- Converges to local instead of global optimum
- Policy evaluation is inefficient and has high variance
We will discuss the approaches to tackle these disadvantages later in this chapter. For now, let's focus on the need for stochastic...