Let's discuss now how to optimize a policy. In policy methods, our main objective is that a given policy  with parameter vectorÂ
 finds the best values of the parameter vector. In order to measure which is the best, we measure
 the quality of the policyÂ
 for different values of the parameter vectorÂ
.
Before discussing the optimization methods, let's first figure out the different ways to measure the quality of a policy :
- If it's an episodic environment,Â
 can be the value function of the start state
 that is if it starts from any stateÂ
, then the value function of it would be the expected sum of reward from that state onwards. Therefore,
![](https://static.packt-cdn.com/products/9781788835725/graphics/assets/01578517-3d8f-4227-a234-94a8e6fc9985.png)
- If it's a continuing environment,Â
 can be the average value function of the states. So, if the environment goes on and...