Summary
In this chapter, you saw an alternative way of solving RL problems: policy gradient methods, which are different in many ways from the familiar DQN method. We explored a basic method called REINFORCE, which is a generalization of our first method in RL-domain cross-entropy. This policy gradient method is simple, but when applied to the Pong environment, it didn’t produce good results.
In the next chapter, we will consider ways to improve the stability of policy gradient methods by combining the families of value-based and policy-based methods.