You're reading from Deep Reinforcement Learning Hands-On Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more

Product type Paperback

Published in Jun 2018

Publisher Packt

ISBN-13 9781788834247

Length 546 pages

Edition 1st Edition

Languages

Python

Tools

Deep Reinforcement Learning

Concepts

Deep Reinforcement Learning

Author (1):

Maxim Lapan

View More author details

Table of Contents (21) Chapters

Preface

1. What is Reinforcement Learning?

2. OpenAI Gym FREE CHAPTER

3. Deep Learning with PyTorch

4. The Cross-Entropy Method

5. Tabular Learning and the Bellman Equation

6. Deep Q-Networks

7. DQN Extensions

8. Stocks Trading Using RL

9. Policy Gradients – An Alternative

10. The Actor-Critic Method

11. Asynchronous Advantage Actor-Critic

12. Chatbots Training with RL

13. Web Navigation

14. Continuous Action Space

15. Trust Regions – TRPO, PPO, and ACKTR

16. Black-Box Optimization in RL

17. Beyond Model-Free – Imagination

18. AlphaGo Zero

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Variance reduction

In the previous chapter, we briefly mentioned that one of the ways to improve the stability of PG methods is to reduce the variance of the gradient. Now let's try to understand why this is important and what it means to reduce the variance. In statistics, variance is the expected square deviation of a random variable from the expected value of this variable.

Variance shows us how far values are dispersed from the mean. When variance is high, the random variable can take values deviated widely from the mean. On the following plot, there is a normal (Gaussian) distribution with the same value of mean Variance reduction , but with different values for the variance.

Figure 1: The effect of variance on Gaussian distribution

Now let's return to PG. It has already been stated in the previous chapter, that the method's idea is to increase the probability of good actions and decrease the chance of bad ones. In math notation, our PG was written as Variance reduction . The scaling factor Q(s, a) specifies...