You're reading from Deep Reinforcement Learning Hands-On Apply modern RL methods to practical problems of chatbots, robotics, discrete optimization, web automation, and more

Product type Paperback

Published in Jan 2020

Publisher Packt

ISBN-13 9781838826994

Length 826 pages

Edition 2nd Edition

Languages

Python

Tools

Deep Reinforcement Learning

Concepts

Chatbots

Author (1):

Maxim Lapan

View More author details

Table of Contents (28) Chapters

Preface

1. What Is Reinforcement Learning?

2. OpenAI Gym FREE CHAPTER

3. Deep Learning with PyTorch

4. The Cross-Entropy Method

5. Tabular Learning and the Bellman Equation

6. Deep Q-Networks

7. Higher-Level RL Libraries

8. DQN Extensions

9. Ways to Speed up RL

10. Stocks Trading Using RL

11. Policy Gradients – an Alternative

12. The Actor-Critic Method

13. Asynchronous Advantage Actor-Critic

14. Training Chatbots with RL

15. The TextWorld Environment

16. Web Navigation

17. Continuous Action Space

18. RL in Robotics

19. Trust Regions – PPO, TRPO, ACKTR, and SAC

20. Black-Box Optimization in RL

21. Advanced Exploration

22. Beyond Model-Free – Imagination

23. AlphaGo Zero

24. RL in Discrete Optimization

25. Multi-agent RL

26. Other Books You May Enjoy

27. Index

The command generation model

In this part of the chapter, we will extend our baseline model with an extra submodule that will generate commands that our DQN network should evaluate. In the baseline model, commands were taken from the admissible commands list, which was taken from the extended information from the environment. But maybe we can generate commands from the observation using the same techniques that we covered in the previous chapter.

The architecture of our new model is shown in Figure 15.12.

Figure 15.12: The architecture of the DQN with command generation

In comparison with Figure 15.3 from earlier in the chapter, there are several changes here. First of all, our preprocessor pipeline no longer accepts a command sequence in the input. The second difference is that the preprocessor's output now not only gets passed to the DQN model, but it also forks to the "Commands generator" submodule.

The responsibility of this new submodule is to produce...