Another complexity we introduced when looking at marathon RL or control learning was the introduction of continuous action spaces. Continuous action spaces represent a set of infinite possible actions an agent could take. Where our agent could previously favor a discrete action, yes or no, it now has to select some points within an infinite space of actions as an action for each joint. This mapping from an infinite action space to an action is not easy to solve—however, we do have neural networks at our disposal, and these provide us with an excellent solution using an architecture not unlike the GANs we looked at in Chapter 3, GAN for Games.
As we discovered in the chapter on GANs, we could propose a network architecture composed of two competing networks. These competing networks would force each network to learn by competing...