Packt+ | Advance your knowledge in tech

You're reading from Reinforcement Learning with TensorFlow A beginner's guide to designing self-learning systems with TensorFlow and OpenAI Gym

Product type Paperback

Published in Apr 2018

Publisher Packt

ISBN-13 9781788835725

Length 334 pages

Edition 1st Edition

Languages

Python

Tools

OpenAI Gym

Concepts

Reinforcement Learning

Author (1):

Sayon Dutta

View More author details

Table of Contents (17) Chapters

Preface

1. Deep Learning – Architectures and Frameworks

2. Training Reinforcement Learning Agents Using OpenAI Gym FREE CHAPTER

3. Markov Decision Process

4. Policy Gradients

5. Q-Learning and Deep Q-Networks

6. Asynchronous Methods

7. Robo Everything – Real Strategy Gaming

8. AlphaGo – Reinforcement Learning at Its Best

9. Reinforcement Learning in Autonomous Driving

10. Financial Portfolio Management

11. Reinforcement Learning in Robotics

12. Deep Reinforcement Learning in Ad Tech

13. Reinforcement Learning in Image Processing

14. Deep Reinforcement Learning in NLP

15. Further topics in Reinforcement Learning

16. Other Books You May Enjoy

Leave a review - let other readers know what you think

Asynchronous n-step Q-learning

The architecture of asynchronous n-step Q-learning is, to an extent, similar to that of asynchronous one-step Q-learning. The difference is that the learning agent actions are selected using the exploration policy for up to

steps or until a terminal state is reached, in order to compute a single update of policy network parameters. This process lists

rewards from the environment since its last update. Then, for each time step, the loss is calculated as the difference between the discounted future rewards at that time step and the estimated Q-value. The gradient of this loss with respect to thread-specific network parameters for each time step is calculated and accumulated. There are multiple such learning agents running and accumulating the gradients in parallel. These accumulated gradients are used to perform asynchronous updates of policy network parameters.

The pseudo-code for asynchronous n-step Q-learning is shown below. Here, the following are the global...