You're reading from Deep Reinforcement Learning Hands-On Apply modern RL methods to practical problems of chatbots, robotics, discrete optimization, web automation, and more

Product type Paperback

Published in Jan 2020

Publisher Packt

ISBN-13 9781838826994

Length 826 pages

Edition 2nd Edition

Languages

Python

Tools

Deep Reinforcement Learning

Concepts

Chatbots

Author (1):

Maxim Lapan

View More author details

Table of Contents (28) Chapters

Preface

1. What Is Reinforcement Learning?

2. OpenAI Gym FREE CHAPTER

3. Deep Learning with PyTorch

4. The Cross-Entropy Method

5. Tabular Learning and the Bellman Equation

6. Deep Q-Networks

7. Higher-Level RL Libraries

8. DQN Extensions

9. Ways to Speed up RL

10. Stocks Trading Using RL

11. Policy Gradients – an Alternative

12. The Actor-Critic Method

13. Asynchronous Advantage Actor-Critic

14. Training Chatbots with RL

15. The TextWorld Environment

16. Web Navigation

17. Continuous Action Space

18. RL in Robotics

19. Trust Regions – PPO, TRPO, ACKTR, and SAC

20. Black-Box Optimization in RL

21. Advanced Exploration

22. Beyond Model-Free – Imagination

23. AlphaGo Zero

24. RL in Discrete Optimization

25. Multi-agent RL

26. Other Books You May Enjoy

27. Index

A2C on Pong results

To start the training, run 02_pong_a2c.py with the --cuda and -n options (which provides a name for the run for TensorBoard):

      rl_book_samples/Chapter10$ ./02_pong_a2c.py --cuda -n t2
      AtariA2C (
         (conv): Sequential (
           (0): Conv2d(4, 32, kernel_size=(8, 8), stride=(4, 4))
           (1): ReLU ()
           (2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2))
           (3): ReLU ()
           (4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
           (5): ReLU ()
         )
         (policy): Sequential (
           (0): Linear (3136 -> 512)
           (1): ReLU ()
           (2): Linear (512 -> 6)
         )
         (value): Sequential (
           (0): Linear (3136 -> 512)
           (1): ReLU ()
           (2): Linear (512 -> 1)
         )
      )
      37799: done 1 games, mean reward -21.000, speed 722.89 f/s
      39065: done 2 games, mean reward -21.000, speed 749.92 f/s
      39076: done 3 games, mean...