Reinforcement learning: Actions, agents, spaces, policies, and rewards
Recall from Chapter 1, An Introduction to Generative AI: "Drawing" Data from Models, that most discriminative AI examples involve applying a continuous or discrete label to a piece of data. In the image examples we have discussed in this book, this could be applying a deep neural network to determine the digit represented by one of the MNIST images, or whether a CIFAR-10 image contains a horse. In these cases, the model produces a single output, a prediction with minimal error. In reinforcement learning, we also want to make such point predictions, but over many steps, and to optimize the total error over repeated uses.
Figure 12.1: Atari video game examples1
As a concrete example, consider a video game with a player controlling a spaceship to shoot down alien vessels. The spaceship navigated by the player in this example is the agent; the set of pixels on the screen at any point in...