Search icon CANCEL
Subscription
0
Cart icon
Cart
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
The Reinforcement Learning Workshop

You're reading from  The Reinforcement Learning Workshop

Product type Book
Published in Aug 2020
Publisher Packt
ISBN-13 9781800200456
Pages 822 pages
Edition 1st Edition
Languages
Authors (9):
Alessandro Palmas Alessandro Palmas
Profile icon Alessandro Palmas
Emanuele Ghelfi Emanuele Ghelfi
Profile icon Emanuele Ghelfi
Dr. Alexandra Galina Petre Dr. Alexandra Galina Petre
Profile icon Dr. Alexandra Galina Petre
Mayur Kulkarni Mayur Kulkarni
Profile icon Mayur Kulkarni
Anand N.S. Anand N.S.
Profile icon Anand N.S.
Quan Nguyen Quan Nguyen
Profile icon Quan Nguyen
Aritra Sen Aritra Sen
Profile icon Aritra Sen
Anthony So Anthony So
Profile icon Anthony So
Saikat Basak Saikat Basak
Profile icon Saikat Basak
View More author details
Toc

Table of Contents (14) Chapters close

Preface
1. Introduction to Reinforcement Learning 2. Markov Decision Processes and Bellman Equations 3. Deep Learning in Practice with TensorFlow 2 4. Getting Started with OpenAI and TensorFlow for Reinforcement Learning 5. Dynamic Programming 6. Monte Carlo Methods 7. Temporal Difference Learning 8. The Multi-Armed Bandit Problem 9. What Is Deep Q-Learning? 10. Playing an Atari Game with Deep Recurrent Q-Networks 11. Policy-Based Methods for Reinforcement Learning 12. Evolutionary Strategies for RL Appendix

Introduction

In the previous chapter, we studied the main elements of Reinforcement Learning (RL). We described an agent as an entity that can perceive an environment's state and act by modifying the environment state in order to achieve a goal. An agent acts through a policy that represents its behavior, and the way the agent selects an action is based on the environment state. In the second half of the previous chapter, we introduced Gym and Baselines, two Python libraries that simplify the environment representation and the algorithm implementation, respectively.

We mentioned that RL considers problems as Markov Decision Processes (MDPs), without entering into the details and without giving a formal definition.

In this chapter, we will formally describe what an MDP is, its properties, and its characteristics. When facing a new problem in RL, we have to ensure that the problem can be formalized as an MDP; otherwise, applying RL techniques is impossible.

Before presenting a formal definition of MDPs, we need to understand Markov Chains (MCs) and Markov Reward Processes (MRPs). MCs and MRPs are specific cases (simplified) of MDPs. An MC only focuses on state transitions without modeling rewards and actions. Consider the example of the game of snakes and ladders, where the next action is completely dependent on the number displayed on the dice. MRPs also include the reward component in the state transition. MRPs and MCs are useful in understanding the characteristics of MDPs gradually. We will be looking at specific examples of MCs and MRPs later in the chapter.

Along with MDPs, this chapter also presents the concepts of the state-value function and the action-value function, which are used to evaluate how good a state is for an agent and how good an action taken in a given state is. State-value functions and action-value functions are the building blocks of the algorithms used to solve real-world problems. The concepts of state-value functions and action-value functions are highly related to the agent's policy and the environment dynamics, as we will learn later in this chapter.

The final part of this chapter presents two Bellman equations, namely the Bellman expectation equation and the Bellman optimality equation. These equations are helpful in the context of RL in order to evaluate the behavior of an agent and find a policy that maximizes the agent's performance in an MDP.

In this chapter, we will practice with some MDP examples, such as the student MDP and Gridworld. We will implement the solution methods and equations explained in this chapter using Python, SciPy, and NumPy.

You have been reading a chapter from
The Reinforcement Learning Workshop
Published in: Aug 2020 Publisher: Packt ISBN-13: 9781800200456
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime