Chapter 9. Policy Gradients – An Alternative
In this first chapter of part three of the book, we’ll consider an alternative way to handle Markov Decision Process (MDP) problems, which forms a full family of methods called Policy Gradients (PG). The chapter will present an overview of the methods, their motivation, and their strengths and weaknesses in comparison to the already familiar Q-learning. We will start with a simple PG method called REINFORCE and will try to apply it to our CartPole environment, comparing this with the Deep Q-Networks (DQN) approach.