You're reading from Deep Reinforcement Learning with Python Master classic RL, deep RL, distributional RL, inverse RL, and more with OpenAI Gym and TensorFlow

Product type Paperback

Published in Sep 2020

Publisher Packt

ISBN-13 9781839210686

Length 760 pages

Edition 2nd Edition

Languages

Python

Tools

Deep Reinforcement Learning

Concepts

Deep Reinforcement Learning

Author (1):

Sudharsan Ravichandiran

View More author details

Table of Contents (22) Chapters

Preface

1. Fundamentals of Reinforcement Learning

2. A Guide to the Gym Toolkit FREE CHAPTER

3. The Bellman Equation and Dynamic Programming

4. Monte Carlo Methods

5. Understanding Temporal Difference Learning

6. Case Study – The MAB Problem

7. Deep Learning Foundations

8. A Primer on TensorFlow

9. Deep Q Network and Its Variants

10. Policy Gradient Method

11. Actor-Critic Methods – A2C and A3C

12. Learning DDPG, TD3, and SAC

13. TRPO, PPO, and ACKTR Methods

14. Distributional Reinforcement Learning

15. Imitation Learning and Inverse RL

16. Deep Reinforcement Learning with Stable Baselines

17. Reinforcement Learning Frontiers

18. Other Books You May Enjoy

19. Index

Appendix 1 – Reinforcement Learning Algorithms

1. Appendix 2 – Assessments

DQN with prioritized experience replay

We learned that in DQN, we randomly sample a minibatch of K transitions from the replay buffer and train the network. Instead of doing this, can we assign some priority to each transition in the replay buffer and sample the transitions that had high priority for learning?

Yes, but first, why do we need to assign priority for the transition, and how can we decide which transition should be given more priority than the others? Let's explore this more in detail.

The TD error is the difference between the target value and the predicted value, as shown here:

A transition that has a high TD error implies that the transition is not correct, and so we need to learn more about that transition to minimize the error. A transition that has a low TD error implies that the transition is already good. We can always learn more from our mistakes rather than only focusing on what we are already good at, right? Similarly, we can learn more...