Packt+ | Advance your knowledge in tech

You're reading from Deep Reinforcement Learning Hands-On Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more

Product type Paperback

Published in Jun 2018

Publisher Packt

ISBN-13 9781788834247

Length 546 pages

Edition 1st Edition

Languages

Python

Tools

Deep Reinforcement Learning

Concepts

Deep Reinforcement Learning

Author (1):

Maxim Lapan

View More author details

Table of Contents (21) Chapters

Preface

1. What is Reinforcement Learning? FREE CHAPTER

2. OpenAI Gym

3. Deep Learning with PyTorch

4. The Cross-Entropy Method

5. Tabular Learning and the Bellman Equation

6. Deep Q-Networks

7. DQN Extensions

8. Stocks Trading Using RL

9. Policy Gradients – An Alternative

10. The Actor-Critic Method

11. Asynchronous Advantage Actor-Critic

12. Chatbots Training with RL

13. Web Navigation

14. Continuous Action Space

15. Trust Regions – TRPO, PPO, and ACKTR

16. Black-Box Optimization in RL

17. Beyond Model-Free – Imagination

18. AlphaGo Zero

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Value of action

To make our life slightly easier, we can define different quantities in addition to the value of state : value of action . Basically, it equals the total reward we can get by executing action a in state s and can be defined via . Being a much less fundamental entity than , this quantity gave a name to the whole family of methods called "Q-learning", because it is slightly more convenient in practice. In these methods, our primary objective is to get values of Q for every pair of state and action.

Q for this state s and action a equals the expected immediate reward and the discounted long-term reward of the destination state. We also can define via :

This just means that the value of some state equals to the value of the maximum action we can execute from this state. It may look very close to the value of state, but there is still a difference, which is important to understand. Finally, we can express Q(s, a) via itself, which will be used in the next chapter's topic of Q...

The rest of the chapter is locked

You're reading from Deep Reinforcement Learning Hands-On Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more

Table of Contents (21) Chapters

Value of action

Authors (1)

Other recommended products

Personalised recommendations for you

You're reading from Deep Reinforcement Learning Hands-On Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more

Table of Contents (21) Chapters

Value of action

Unlock this book and the full library FREE for 7 days

Authors (1)

Other recommended products

Personalised recommendations for you