Summary
In this chapter, we saw an alternative way of solving RL problems: PG, which is different in many ways from the familiar DQN method. We explored the basic method called REINFORCE, which is a generalization of our first method in RL-domain cross entropy. This method is simple, but, being applied to the Pong environment, didn’t show good results.
In the next chapter, we’ll consider ways to improve the stability of PG by combining both families of value-based and policy-based methods.