Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Reinforcement Learning with Python

You're reading from   Mastering Reinforcement Learning with Python Build next-generation, self-learning models using reinforcement learning techniques and best practices

Arrow left icon
Product type Paperback
Published in Dec 2020
Publisher Packt
ISBN-13 9781838644147
Length 544 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Enes Bilgin Enes Bilgin
Author Profile Icon Enes Bilgin
Enes Bilgin
Arrow right icon
View More author details
Toc

Table of Contents (24) Chapters Close

Preface 1. Section 1: Reinforcement Learning Foundations
2. Chapter 1: Introduction to Reinforcement Learning FREE CHAPTER 3. Chapter 2: Multi-Armed Bandits 4. Chapter 3: Contextual Bandits 5. Chapter 4: Makings of a Markov Decision Process 6. Chapter 5: Solving the Reinforcement Learning Problem 7. Section 2: Deep Reinforcement Learning
8. Chapter 6: Deep Q-Learning at Scale 9. Chapter 7: Policy-Based Methods 10. Chapter 8: Model-Based Methods 11. Chapter 9: Multi-Agent Reinforcement Learning 12. Section 3: Advanced Topics in RL
13. Chapter 10: Introducing Machine Teaching 14. Chapter 11: Achieving Generalization and Overcoming Partial Observability 15. Chapter 12: Meta-Reinforcement Learning 16. Chapter 13: Exploring Advanced Topics 17. Section 4: Applications of RL
18. Chapter 14: Solving Robot Learning 19. Chapter 15: Supply Chain Management 20. Chapter 16: Personalization, Marketing, and Finance 21. Chapter 17: Smart City and Cybersecurity 22. Chapter 18: Challenges and Future Directions in Reinforcement Learning 23. Other Books You May Enjoy

The three paradigms of ML

RL is a separate paradigm in Machine Learning (ML) along supervised learning (SL) and unsupervised learning (UL). It goes beyond what the other two paradigms involve – perception, classification, regression and clustering – and makes decisions. Importantly however, RL utilizes the supervised and unsupervised ML methods in doing so. Therefore, RL is a distinct yet a closely related field to supervised and UL, and it's important to have a grasp of them.

Supervised learning

SL is about learning a mathematical function that maps a set of inputs to the corresponding outputs/labels as accurately as possible. The idea is that we don't know the dynamics of the process that generates the output, but we try to figure it out using the data coming out of it. Consider the following examples:

  • An image recognition model that classifies the objects on the camera of a self-driving car as pedestrian, stop sign, truck
  • A forecasting model that predicts the customer demand of a product for a particular holiday season using past sales data.

It is extremely difficult to come up with the precise rules to visually differentiate objects, or what factors lead to customers demanding a product. Therefore, SL models infer them from labeled data. Here are some key points about how it works:

  • During training, models learn from ground truth labels/output provided by a supervisor (which could be a human expert or a process),
  • During inference, models make predictions about what the output might be given the input,
  • Models use function approximators to represent the dynamics of the processes that generate the outputs.

Unsupervised learning

UL algorithms identify patterns in data that were previously unknown. While using such models, we might have an idea of what to expect as a result, but we don't supply the models with labels. For example:

  • Identifying homogenous segments on an image provided by the camera of a self-driving car. The model is likely to separate the sky, road and buildings based on the textures on the image.
  • Clustering weekly sales data into 3 groups based on sales volume. The output is likely to be weeks with low, medium, and high sales volume.

As you can tell, this is quite different than how SL works, namely:

  • UL models don't know what the ground truth is, and there is no label to map the input to. They just identify the different patterns in the data. Even after doing so, for example, the model would not be aware that it separated sky from road, or a holiday week from a regular week.
  • During inference, the model would cluster the input into one of the groups it had identified, again, without knowing what that group represents.
  • Function approximators, such as neural networks, are used in some UL algorithms, but not always.

With supervised and UL reintroduced, we'll now compare them with RL.

Reinforcement learning

RL is a framework to learn how to make decisions under uncertainty to maximize a long-term benefit through trial and error. These decisions are made sequentially, and earlier decisions affect the situations and benefits that will be encountered later. This separates RL from both supervised and UL, which don't involve any decision-making. Let's revisit the examples we provided earlier to see how a RL model would differ from supervised and UL models in terms of what it tries to find out.

  • For a self-driving car, given the types and positions of all the objects on the camera, and the edges of the lanes on the road, the model might learn how to steer the wheel and what the speed of the car should be to pass the car ahead safely and as quickly as possible.
  • Given the historical sales numbers for a product and the time it takes to bring the inventory from the supplier to the store, the model might learn when and how many units to order from the supplier, so that seasonal customer demand is satisfied with high likelihood while the inventory and transportation costs are minimized.

As you will have noticed, the tasks that RL is trying to accomplish are of different nature and more complex than those simply addressed by SL and UL alone. Let's elaborate on how RL is different:

  • The output of an RL model is a decision given the situation, not a prediction or clustering.
  • There are no ground truth decisions provided by a supervisor that tell what the ideal decisions are in different situations. Instead, the model learns the best decisions from the feedback from its own experience and the decisions it made in the past. For example, through trial and error, an RL model would learn that speeding too much while passing a car may lead to accidents; and ordering too much inventory before holidays will cause excess inventory later.
  • RL models often use outputs of SL models as inputs to make decisions. For example, the output of an image recognition model in a self-driving car could be used to make driving decisions. Similarly, the output of a forecasting model is often an input to an RL model that makes inventory replenishment decisions.
  • Even in the absence of such an input from an auxiliary model, RL models, either implicitly or explicitly, predict what situations its decisions will lead to in the future.
  • RL utilizes many methods developed for supervised and UL, such as various types of neural networks as function approximators.

So, what differentiates RL from other ML methods is that it is a decision-making framework. What makes it exciting and powerful, though, is its similarities to how we learn as humans to make decisions from experience. Imagine a toddler learning how to build a tower from toy blocks. Usually, the taller the tower, the happier the toddler is. Every increment in height is a success. Every collapse is a failure. She quickly discovers that the closer the next block is to the center of the one beneath, the more stable the tower is.  This is reinforced when a block placed too close to the edge more readily topples. With practice, she manages to stack several blocks on top of each other. She realizes how she stacks the earlier blocks creates a foundation which determines how tall of a tower she can build. Thus, she learns.

Of course, the toddler did not learn these architectural principles from a blueprint. She learnt from the commonalities in her failure and success. The increasing height of the tower or its collapse provided a feedback signal upon which she refined her strategy accordingly. Learning from experience, rather than a blueprint is at the center of RL. Just as the toddler discovers which block positions lead to taller towers, an RL agent identifies the actions with the highest long-term rewards through trial and error. This is what makes RL such a profound form of AI; it's unmistakably human. 

Over the past several years, there have been many amazing success stories proving the potential in RL. Moreover, there are many industries it is about to transform. So, before diving into the technical aspects of RL, let's further motivate ourselves by looking into what RL can do in practice.

You have been reading a chapter from
Mastering Reinforcement Learning with Python
Published in: Dec 2020
Publisher: Packt
ISBN-13: 9781838644147
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime