Summary
In this chapter, we explored another application of generative models in reinforcement learning. First, we described how RL allows us to learn the behavior of an agent in an environment, and how deep neural networks allowed Q-learning to scale to complex environments with extremely large observation and action spaces.
We then discussed inverse reinforcement learning, and how it varies from RL by "inverting" the problem and attempting to "learn by example." We discussed how the problem of trying to compare a proposed and expert network can be scored using entropy, and how a particular, regularized version of this entropy loss has a similar form as the GAN problem we studied in Chapter 6, called GAIL (Generative Adversarial Imitation Learning). We saw how GAIL is but one of many possible formulations of this general idea, using different loss functions. Finally, we implemented GAIL using the bullet-gym physics simulator and OpenAI Gym.
...