You're reading from Deep Reinforcement Learning Hands-On Apply modern RL methods to practical problems of chatbots, robotics, discrete optimization, web automation, and more

Product type Paperback

Published in Jan 2020

Publisher Packt

ISBN-13 9781838826994

Length 826 pages

Edition 2nd Edition

Languages

Python

Tools

Deep Reinforcement Learning

Concepts

Chatbots

Author (1):

Maxim Lapan

Preface

1. What Is Reinforcement Learning?

2. OpenAI Gym FREE CHAPTER

3. Deep Learning with PyTorch

4. The Cross-Entropy Method

5. Tabular Learning and the Bellman Equation

6. Deep Q-Networks

7. Higher-Level RL Libraries

8. DQN Extensions

9. Ways to Speed up RL

10. Stocks Trading Using RL

11. Policy Gradients – an Alternative

12. The Actor-Critic Method

13. Asynchronous Advantage Actor-Critic

14. Training Chatbots with RL

15. The TextWorld Environment

16. Web Navigation

17. Continuous Action Space

18. RL in Robotics

19. Trust Regions – PPO, TRPO, ACKTR, and SAC

20. Black-Box Optimization in RL

21. Advanced Exploration

22. Beyond Model-Free – Imagination

23. AlphaGo Zero

24. RL in Discrete Optimization

25. Multi-agent RL

26. Other Books You May Enjoy

27. Index

Value iteration in practice

The complete example is in Chapter05/01_frozenlake_v_iteration.py. The central data structures in this example are as follows:

Reward table: A dictionary with the composite key "source state" + "action" + "target state". The value is obtained from the immediate reward.
Transitions table: A dictionary keeping counters of the experienced transitions. The key is the composite "state" + "action", and the value is another dictionary that maps the target state into a count of times that we have seen it. For example, if in state 0 we execute action 1 ten times, after three times it will lead us to state 4 and after seven times to state 5.
The entry with the key (0, 1) in this table will be a dict with contents {4: 3, 5: 7}. We can use this table to estimate the probabilities of our transitions.
Value table: A dictionary that maps a state into the calculated value of this state.

The overall...