In this chapter, we expanded upon the knowledge that we obtained about in Chapter 8, Reinforcement Learning, to learn about DDPG, HER, and how to combine these methods to create a reinforcement learning algorithm that independently controls a robotic arm.
The Deep Q network that we used to solve game challenges worked in discrete spaces; when building algorithms for more fluid motion tasks such as robots or self-driving cards, we need a class of algorithms that can handle continuous action spaces. For this, use policy gradient methods, which learn a policy from a set of actions directly. We can improve this learning by using an experience replay buffer, which stores positive past experiences so that they may be sampled during training time so that the algorithm knows how to act.
Sometimes, our algorithms can fail to learn due to them not being able to find positive actions...