You're reading from Deep Reinforcement Learning Hands-On Apply modern RL methods to practical problems of chatbots, robotics, discrete optimization, web automation, and more

Product type Paperback

Published in Jan 2020

Publisher Packt

ISBN-13 9781838826994

Length 826 pages

Edition 2nd Edition

Languages

Python

Tools

Deep Reinforcement Learning

Concepts

Chatbots

Author (1):

Maxim Lapan

View More author details

Table of Contents (28) Chapters

Preface

1. What Is Reinforcement Learning?

2. OpenAI Gym FREE CHAPTER

3. Deep Learning with PyTorch

4. The Cross-Entropy Method

5. Tabular Learning and the Bellman Equation

6. Deep Q-Networks

7. Higher-Level RL Libraries

8. DQN Extensions

9. Ways to Speed up RL

10. Stocks Trading Using RL

11. Policy Gradients – an Alternative

12. The Actor-Critic Method

13. Asynchronous Advantage Actor-Critic

14. Training Chatbots with RL

15. The TextWorld Environment

16. Web Navigation

17. Continuous Action Space

18. RL in Robotics

19. Trust Regions – PPO, TRPO, ACKTR, and SAC

20. Black-Box Optimization in RL

21. Advanced Exploration

22. Beyond Model-Free – Imagination

23. AlphaGo Zero

24. RL in Discrete Optimization

25. Multi-agent RL

26. Other Books You May Enjoy

27. Index

The model application

Okay, imagine that we have trained the model using the process just described. How should we use it to solve the scrambled cube? From the network's structure, you might imagine the obvious, but not very successful, way:

Feed the model the current state of the cube that we want to solve
From the policy head, get the largest action to perform (or sample it from the resulting distribution)
Apply the action to the cube
Repeat the process until the solved state has been reached

On paper, this method should work, but in practice, it has one serious issue: it doesn't! The main reason for that is our model's quality. Due to the size of the state space and the nature of the NNs, it just isn't possible to train an NN to return the exact optimal action for any input state all of the time. Rather than telling us what to do to get the solved state, our model shows us promising directions to explore. Those directions could bring...