You're reading from The Reinforcement Learning Workshop

Product type Book

Published in Aug 2020

Publisher Packt

ISBN-13 9781800200456

Pages 822 pages

Edition 1st Edition

Languages

Python

Concepts

Neural Networks

Authors (9):

Alessandro Palmas

Emanuele Ghelfi

Dr. Alexandra Galina Petre

Mayur Kulkarni

Anand N.S.

Quan Nguyen

Aritra Sen

Anthony So

Saikat Basak

View More author details

Table of Contents (14) Chapters

Preface

1. Introduction to Reinforcement Learning

2. Markov Decision Processes and Bellman Equations

3. Deep Learning in Practice with TensorFlow 2

4. Getting Started with OpenAI and TensorFlow for Reinforcement Learning

5. Dynamic Programming

6. Monte Carlo Methods

7. Temporal Difference Learning

8. The Multi-Armed Bandit Problem

9. What Is Deep Q-Learning?

10. Playing an Atari Game with Deep Recurrent Q-Networks

11. Policy-Based Methods for Reinforcement Learning

12. Evolutionary Strategies for RL

Appendix

Introduction

In the previous chapter, we studied the main elements of Reinforcement Learning (RL). We described an agent as an entity that can perceive an environment's state and act by modifying the environment state in order to achieve a goal. An agent acts through a policy that represents its behavior, and the way the agent selects an action is based on the environment state. In the second half of the previous chapter, we introduced Gym and Baselines, two Python libraries that simplify the environment representation and the algorithm implementation, respectively.

We mentioned that RL considers problems as Markov Decision Processes (MDPs), without entering into the details and without giving a formal definition.

In this chapter, we will formally describe what an MDP is, its properties, and its characteristics. When facing a new problem in RL, we have to ensure that the problem can be formalized as an MDP; otherwise, applying RL techniques is impossible.

Before presenting a formal definition of MDPs, we need to understand Markov Chains (MCs) and Markov Reward Processes (MRPs). MCs and MRPs are specific cases (simplified) of MDPs. An MC only focuses on state transitions without modeling rewards and actions. Consider the example of the game of snakes and ladders, where the next action is completely dependent on the number displayed on the dice. MRPs also include the reward component in the state transition. MRPs and MCs are useful in understanding the characteristics of MDPs gradually. We will be looking at specific examples of MCs and MRPs later in the chapter.

Along with MDPs, this chapter also presents the concepts of the state-value function and the action-value function, which are used to evaluate how good a state is for an agent and how good an action taken in a given state is. State-value functions and action-value functions are the building blocks of the algorithms used to solve real-world problems. The concepts of state-value functions and action-value functions are highly related to the agent's policy and the environment dynamics, as we will learn later in this chapter.

The final part of this chapter presents two Bellman equations, namely the Bellman expectation equation and the Bellman optimality equation. These equations are helpful in the context of RL in order to evaluate the behavior of an agent and find a policy that maximizes the agent's performance in an MDP.

In this chapter, we will practice with some MDP examples, such as the student MDP and Gridworld. We will implement the solution methods and equations explained in this chapter using Python, SciPy, and NumPy.

Authors (9)

Alessandro Palmas

Alessandro Palmas is an aerospace engineer with more than 7 years of proven expertise in software development for advanced scientific applications and complex software systems. As the R&D head in an aerospace & defense Italian SME, he coordinates projects in contexts ranging from space flight dynamics to machine learning-based autonomous systems. His main ML focus is on computer vision, 3D models, volumetric networks, and deep reinforcement learning. He also founded innovative initiatives, his last being Artificial Twin, which provides advanced technologies for machine learning, physical modeling, and computational geometry applications. Two key areas in which current Artificial Twin deep RL work is focused on are video games entertainment, and guidance, navigation & control systems.

See other products by Alessandro Palmas

Emanuele Ghelfi

Emanuele Ghelfi is a computer science and machine learning engineer. He received an M.Sc. degree in computer science and engineering at Politecnico di Milano in December 2018. In his thesis, he proposed a new RL algorithm for an MDP extension. The paper from the thesis got accepted at ICML 2019. Hes an organizer of the community data science and artificial intelligence in Parma. Emanuele presented tutorials about generative adversarial networks at conferences like PyCon X (Florence) and EuroSciPy (Bilbao). He is also a developer of the machine learning package AshPy, available on GitHub and PyPi.

See other products by Emanuele Ghelfi

Dr. Alexandra Galina Petre

Dr. Alexandra Galina Petre is a machine learning and data science expert, currently leading and teaching various engineering modules in Coventry, United Kingdom. Her leadership and management experience is linked to her work in quality management for the Airbus A380 and her IET membership. She received her Ph.D. in user feedback-based reinforcement learning for vehicle comfort control with a focus on revolutionary heating ventilation and air conditioning SARSA-based control systems that can learn from the drivers preferential changes to the UI. Her research is focusing on how thermal comfort depends on the occupants inclination to manual control as outlined in the SAE paper published in 2019, and the development of a novel Java-based user model (UBL) integrated within a car cabin environment. She is working on deep RL implementations in Python and R-based statistical developments within various automation and control projects.

See other products by Dr. Alexandra Galina Petre

Mayur Kulkarni

Mayur Kulkarni works in the Machine Learning research team at Microsoft and has previously been at IIT Bombay, and IIM Lucknow. He has also been an instructor for the postgraduate programs in Artificial Intelligence and Machine Learning at UpGrad and IIIT Bangalore, covering topics in Deep Reinforcement Learning. He is one of the contributors to DVC, torch, and scikit-learn, which are some of the most popular open-source machine learning libraries in Python.

See other products by Mayur Kulkarni

Anand N.S.

Anand N.S. has more than two decades of technology experience working, with a strong hands-on track record of application of artificial intelligence, machine learning, and data science to create measurable business outcomes. He has been granted several US patents in the areas of data science, machine learning, and artificial Intelligence. Anand has a B.Tech in Electrical Engineering from IIT Madras and an MBA with a Gold Medal from IIM Kozhikode.

See other products by Anand N.S.

Quan Nguyen

Quan Nguyen, the author of the first edition of this book, is a Python programmer with a strong passion for machine learning. He holds a dual degree in mathematics and computer science, with a minor in philosophy, earned from DePauw University. Quan is deeply involved in the Python community and has authored multiple Python books, contributing to the Python Software Foundation and regularly sharing insights on DataScience portal. He is currently pursuing a Ph.D. in computer science at Washington University in St. Louis.

See other products by Quan Nguyen

Aritra Sen

Aritra Sen currently works as a data scientist in Ericsson. His current role includes building and deploying large scale machine learning solutions for the telecom industry. He has around 10 years of experience in data science and business intelligence. He previously worked in Cognizant, KPMG, IBM, and TCS. Aritra also has a keen interest in blogging and he regularly writes about machine learning, deep learning, etc. He also filed a patent related to the telecom industry.

See other products by Aritra Sen

Anthony So

Anthony So is a renowned leader in data science. He has extensive experience in solving complex business problems using advanced analytics and AI in different industries including financial services, media, and telecommunications. He is currently the chief data officer of one of the most innovative fintech start-ups. He is also the author of several best-selling books on data science, machine learning, and deep learning. He has won multiple prizes at several hackathon competitions, such as Unearthed, GovHack, and Pepper Money. Anthony holds two master's degrees, one in computer science and the other in data science and innovation.

See other products by Anthony So

Saikat Basak

Saikat Basak is a data scientist and a passionate programmer. Having worked with multiple industry leaders, he has a good understanding of problem areas that can potentially be solved using data. Apart from being a data guy, he is also a science geek and loves to explore new ideas in the frontiers of science and technology.

See other products by Saikat Basak