Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Deep Reinforcement Learning Hands-On
Deep Reinforcement Learning Hands-On

Deep Reinforcement Learning Hands-On: Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more

eBook
€20.98 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Table of content icon View table of contents Preview book icon Preview Book

Deep Reinforcement Learning Hands-On

Chapter 1. What is Reinforcement Learning?

Reinforcement Learning is a subfield of machine learning which addresses the problem of automatic learning of optimal decisions over time. This is a general and common problem studied in many scientific and engineering fields.

In our changing world, even problems which look like static input-output problems become dynamic in a larger perspective. For example, consider that you're solving the simple supervised learning problem of pet image classification with two target classes—dog and cat. You've gathered the training dataset and implemented the classifier using your favorite deep learning toolkit, and after a while, the model that has converged demonstrates excellent performance. Good? Definitely! You've deployed it and left it running for a while. Then, after a vacation on some seaside resort, you discover that dog haircut fashions have changed, and a significant portion of your queries are now misclassified, so you need to update your training images and repeat the process again. Good? Definitely not!

The preceding example is intended to show that even simple Machine Learning (ML) problems have a hidden time dimension, which is frequently overlooked, but it might become an issue in a production system.

Reinforcement Learning (RL) is an approach that natively incorporates this extra dimension (which is usually time, but not necessarily) into learning equations, which puts it much close to the human perception of artificial intelligence. In this chapter, we will become familiar with the following:

  • How RL is related to and differs from other ML disciplines: supervised and unsupervised learning
  • What the main RL formalisms are and how they are related to each other
  • Theoretical foundations of RL: the Markov decision processes

Learning – supervised, unsupervised, and reinforcement

You may be familiar with the notion of supervised learning, which is the most studied and well-known machine learning problem. Its basic question is: how do you automatically build a function that maps some input into some output, when given a set of example pairs? It sounds simple in those terms, but the problem includes many tricky questions that computers have only recently started to deal with some success. There are lots of examples of supervised learning problems, including the following:

  • Text classification: Is this email message spam or not?
  • Image classification and object location: Does this image contain a picture of a cat, dog, or something else?
  • Regression problems: Given the information from weather sensors, what will be the weather tomorrow?
  • Sentiment analysis: What's the customer satisfaction level of this review?

These questions can look different, but they share the same idea: we have many examples of the input and desired output, and we want to learn how to generate the output for some future, currently unseen inputs. The name, supervised comes from the fact that we learn from the known answers, which were obtained from some supervisor who has provided us with those labeled examples.

At the other extreme, we have the so-called unsupervised learning, which assumes no supervision that has no known labels assigned to our data. The main objective is to learn some hidden structure of the dataset at hand. One common example of such an approach to learning is the clustering of data. This happens when our algorithm tries to combine data items into a set of clusters, which can reveal relationships in data.

Another unsupervised learning method that is becoming more and more popular is, Generative Adversarial Networks (GANs). When we have two competing neural networks, the first of them is trying to generate fake data to fool the second network, while the other is trying to discriminate artificially generated data from data sampled from our dataset. Over time, both of them are becoming more and more skillful in their tasks by capturing subtle specific patterns of your dataset.

RL is the third camp and lays somewhere in between full supervision and a complete lack of predefined labels. On the one hand, it uses many well-established methods of supervised learning such as deep neural networks for function approximation, stochastic gradient descent, and backpropagation, to learn data representation. On the other hand, it usually applies them in a different way.

In the next two sections of the chapter, we'll have the chance to explore specific details of the RL approach including its assumptions and abstractions in its strict mathematical form. For now, to compare RL to supervised and unsupervised learning, we'll take a less formal, but more intuitive description.

Imagine you have an agent that needs to take actions in some environment. A robot mouse in a maze is a good example, but we can also imagine an automatic helicopter trying to make a roll, or a chess program learning how to beat a grandmaster. Let's go with the robot mouse for simplicity.

Learning – supervised, unsupervised, and reinforcement

Figure 1: Robot mouse maze world

Its environment is a maze with food at some points and electricity at others. The robot mouse can take actions such as turn left/right and move forward. Finally, at every moment it can observe the full state of the maze to make a decision about the actions it may take. It is trying to find as much food as possible, while avoiding an electric shock whenever possible. These food and electricity signals stand as a reward given to the agent by the environment as additional feedback about the agent's actions. The reward is a very important concept in RL, and we'll talk about it later in the chapter. For now, it will be enough to understand that the final goal of the agent is to get as much total reward as possible. In our particular example, the mouse could suffer a bit of an electric shock to get to the place with plenty of food—this will be a better result for the mouse than just standing still and gaining nothing.

We don't want to hard-code knowledge about the environment and the best actions to take in every specific situation into the robot—it will take too much effort and may become useless even with a slight maze change. What we want to do is to have some magic set of methods that will allow our robot to learn on its own how to avoid electricity and gather as much food as possible.

Reinforcement Learning is exactly this magic toolbox, which plays differently from supervised and unsupervised learning methods. It doesn't work with predefined labels as supervised learning does. Nobody labels all the images the robot sees as good or bad or gives it the best direction to turn in.

However, we're not completely blind as in an unsupervised learning setup—we have a reward system. Rewards can be positive from gathering the food, negative from electric shocks, or neutral when nothing special happens. By observing such a reward and relating it to the actions we've taken, our agent learns how to perform an action better, gather more food, and get fewer electric shocks.

Of course, RL generality and flexibility comes with a price. RL is considered to be a much more challenging area than supervised and unsupervised learning. Let's quickly discuss what makes Reinforcement Learning tricky.

The first thing to note is that observation in RL depends on an agent's behavior and to some extent, it is the result of their behavior. If your agent decides to do inefficient things, then the observations will tell you nothing about what they have done wrong and what should be done to improve the outcome (the agent will just get negative feedback all the time). If the agent is stubborn and keeps making mistakes, then the observations can make the false impression that there is no way to get a larger reward—life is suffering—which could be totally wrong. In machine learning terms, it can be rephrased as having non-i.i.d data. The abbreviation i.i.d stands for independent and identically distributed, a requirement for most supervised learning methods.

The second thing that complicates our agent's life is that they need to not only exploit the policy they have learned, but to actively explore the environment, because, who knows, maybe by doing things differently we can significantly improve the outcome we get. The problem is that too much exploration may also seriously decrease the reward (not to mention that the agent can actually forget what they have learned before), so, we need to find a balance between these two activities somehow. This exploration/exploitation dilemma is one of the open fundamental questions in RL.

People face this choice all the time: should I go to an already known place for dinner or try this new fancy restaurant? How frequently should you change jobs? Should you study a new field or keep working in your area? There are no universal answers to these questions.

The third complication factor lays in the fact that reward can be seriously delayed from actions. In cases of chess, it can be one single strong move in the middle of the game that has shifted the balance. During learning, we need to discover such casualties, which can be tricky to do over the flow of time and our actions.

However, despite all these obstacles and complications, RL has made huge improvements over recent years and is becoming more and more active as a field of research and practical application.

Interested? Let's get to the details and look at RL formalisms and play rules.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Explore deep reinforcement learning (RL), from the first principles to the latest algorithms
  • Evaluate high-profile RL methods, including value iteration, deep Q-networks, policy gradients, TRPO, PPO, DDPG, D4PG, evolution strategies and genetic algorithms
  • Keep up with the very latest industry developments, including AI-driven chatbots

Description

Deep Reinforcement Learning Hands-On is a comprehensive guide to the very latest DL tools and their limitations. You will evaluate methods including Cross-entropy and policy gradients, before applying them to real-world environments. Take on both the Atari set of virtual games and family favorites such as Connect4. The book provides an introduction to the basics of RL, giving you the know-how to code intelligent learning agents to take on a formidable array of practical tasks. Discover how to implement Q-learning on 'grid world' environments, teach your agent to buy and trade stocks, and find out how natural language models are driving the boom in chatbots.

Who is this book for?

Some fluency in Python is assumed. Basic deep learning (DL) approaches should be familiar to readers and some practical experience in DL will be helpful. This book is an introduction to deep reinforcement learning (RL) and requires no background in RL.

What you will learn

  • Understand the DL context of RL and implement complex DL models
  • Learn the foundation of RL: Markov decision processes
  • Evaluate RL methods including Cross-entropy, DQN, Actor-Critic, TRPO, PPO, DDPG, D4PG and others
  • Discover how to deal with discrete and continuous action spaces in various environments
  • Defeat Atari arcade games using the value iteration method
  • Create your own OpenAI Gym environment to train a stock trading agent
  • Teach your agent to play Connect4 using AlphaGo Zero
  • Explore the very latest deep RL research on topics including AI-driven chatbots

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jun 21, 2018
Length: 546 pages
Edition : 1st
Language : English
ISBN-13 : 9781788839303
Category :
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want

Product Details

Publication date : Jun 21, 2018
Length: 546 pages
Edition : 1st
Language : English
ISBN-13 : 9781788839303
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 103.97
Mastering Machine Learning Algorithms
€36.99
Hands-On Reinforcement Learning with Python
€29.99
Deep Reinforcement Learning Hands-On
€36.99
Total 103.97 Stars icon

Table of Contents

20 Chapters
1. What is Reinforcement Learning? Chevron down icon Chevron up icon
2. OpenAI Gym Chevron down icon Chevron up icon
3. Deep Learning with PyTorch Chevron down icon Chevron up icon
4. The Cross-Entropy Method Chevron down icon Chevron up icon
5. Tabular Learning and the Bellman Equation Chevron down icon Chevron up icon
6. Deep Q-Networks Chevron down icon Chevron up icon
7. DQN Extensions Chevron down icon Chevron up icon
8. Stocks Trading Using RL Chevron down icon Chevron up icon
9. Policy Gradients – An Alternative Chevron down icon Chevron up icon
10. The Actor-Critic Method Chevron down icon Chevron up icon
11. Asynchronous Advantage Actor-Critic Chevron down icon Chevron up icon
12. Chatbots Training with RL Chevron down icon Chevron up icon
13. Web Navigation Chevron down icon Chevron up icon
14. Continuous Action Space Chevron down icon Chevron up icon
15. Trust Regions – TRPO, PPO, and ACKTR Chevron down icon Chevron up icon
16. Black-Box Optimization in RL Chevron down icon Chevron up icon
17. Beyond Model-Free – Imagination Chevron down icon Chevron up icon
18. AlphaGo Zero Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Most Recent
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3
(34 Ratings)
5 star 67.6%
4 star 14.7%
3 star 5.9%
2 star 5.9%
1 star 5.9%
Filter icon Filter
Most Recent

Filter reviews by




Ansatz Mar 01, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Because it's written by a practitioner, this book is not overly theoretical. It is surprisingly easy to read compared to most textbooks. Every chapter walks you through code examples that can be cloned from a GitHub repository. I had to fiddle with the python version and library dependencies s bit to get them to run, and Gym only runs on Linux, but otherwise it's awesome to be able to run the same code.It's useful to look at the code on the computer in combination with the detailed explanation in the book. I modified the code in Chapter 6 (Deep Q networks) to work for Connnect4, which was a great exercise.I would recommend this book to anyone interested in the subject.
Amazon Verified review Amazon
Anonymous1 May 24, 2020
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
Bit if of a rip off for the publishers to have only grayscale images in the book, makes understanding the images and graphs very difficult. Surely for £28.99 they could afford to have done better.
Amazon Verified review Amazon
Romain G. Feb 03, 2020
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
Livre arrivé très corné, sans un emballage souple qui ne le protégeait pas...
Amazon Verified review Amazon
Mike Nov 22, 2019
Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
Eigentlich bin ich kein Fan, ein sogenanntes "Cook Book" zu kaufen. Das wurde wider bestätigt bei dieser Bestellung.Das Layout sowie die Druckqualität sind wirklich kein Highlight. Abbildungen haben schlechte Auflösung. Der Inhalt beinhaltet wenige wissenschaftliche Perspektiven.Also, es lohnt sich nicht!
Amazon Verified review Amazon
Eoerl Oct 17, 2019
Full star icon Full star icon Full star icon Full star icon Full star icon 5
very good mix of theory and hands-on, good variety of subjects, pretty up to date. I even bought it paperback & Kindle, read end to end in both cases, I liked it that much
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.