Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
PyTorch 1.x Reinforcement Learning Cookbook

You're reading from   PyTorch 1.x Reinforcement Learning Cookbook Over 60 recipes to design, develop, and deploy self-learning AI models using Python

Arrow left icon
Product type Paperback
Published in Oct 2019
Publisher Packt
ISBN-13 9781838551964
Length 340 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Yuxi (Hayden) Liu Yuxi (Hayden) Liu
Author Profile Icon Yuxi (Hayden) Liu
Yuxi (Hayden) Liu
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Getting Started with Reinforcement Learning and PyTorch 2. Markov Decision Processes and Dynamic Programming FREE CHAPTER 3. Monte Carlo Methods for Making Numerical Estimations 4. Temporal Difference and Q-Learning 5. Solving Multi-armed Bandit Problems 6. Scaling Up Learning with Function Approximation 7. Deep Q-Networks in Action 8. Implementing Policy Gradients and Policy Optimization 9. Capstone Project – Playing Flappy Bird with DQN 10. Other Books You May Enjoy

What this book covers

Chapter 1, Getting Started with Reinforcement Learning and PyTorch, is the starting point for readers who are looking forward to beginning this book's step-by-step guide to reinforcement learning with PyTorch. We will set up the working environment and OpenAI Gym and get familiar with reinforcement learning environments using the Atari and CartPole playgrounds. The chapter will also cover the implementation of several basic reinforcement learning algorithms, including random search, hill-climbing, and policy gradient. At the end, readers will also have a chance to review the essentials of PyTorch and get ready for the upcoming learning examples and projects.

Chapter 2, Markov Decision Process and Dynamic Programming, starts with the creation of a Markov chain and a Markov Decision Process, which is the core of most reinforcement learning algorithms. It will then move on to two approaches to solve a Markov Decision Process (MDP), value iteration and policy iteration. We will get more familiar with MDP and the Bellman equation by practicing policy evaluation. We will also demonstrate how to solve the interesting coin flipping gamble problem step by step. At the end, we will learn how to perform dynamic programming to scale up the learning.

Chapter 3, Monte Carlo Methods for Making Numerical Estimations, is focused on Monte Carlo methods. We will start by estimating the value of pi with Monte Carlo. Moving on, we will learn how to use the Monte Carlo method to predict state values and state-action values. We will demonstrate training an agent to win at Blackjack using Monte Carlo. Also, we will explore on-policy, first-visit Monte Carlo control and off-policy Monte Carlo control by developing various algorithms. Monte Carlo Control with an epsilon-greedy policy and weighted importance sampling will also be covered.

Chapter 4, Temporal Difference and Q-Learning, starts by setting up the CliffWalking and Windy Gridworld environment playground, which will be used in temporal difference and Q-Learning. Through our step-by-step guide, readers will explore Temporal Difference for prediction, and will gain practical experience with Q-Learning for off-policy control, and SARSA for on-policy control. We will also work on an interesting project, the taxi problem, and demonstrate how to solve it using the Q-Learning and SARSA algorithms. Finally, we will cover the Double Q-learning algorithm as a bonus section.

Chapter 5, Solving Multi-Armed Bandit Problems, covers the multi-armed bandit algorithm, which is probably one of the most popular algorithms in reinforcement learning. This will start with the creation of a multi-armed bandit problem. We will see how to solve the multi-armed bandit problem using four strategies, these being the epsilon-greedy policy, softmax exploration, the upper confidence bound algorithm, and the Thompson sampling algorithm. We will also work on a billion-dollar problem, online advertising, and demonstrate how to solve it using the multi-armed bandit algorithm. Finally, we will develop a more complex algorithm, the contextual bandit algorithm, and use it to optimize display advertising.

Chapter 6, Scaling Up Learning with Function Approximation, is focused on function approximation and will start with setting up the Mountain Car environment playground. Through our step-by-step guide, we will cover the motivation for function approximation over Table Lookup, and gain experience in incorporating function approximation into existing algorithms such as Q-Learning and SARSA. We will also cover an advanced technique, batching using experience replay. Finally, we will cover how to solve the CartPole problem using what we have learned in the chapter as a whole.

Chapter 7, Deep Q-Networks in Action, covers Deep Q-Learning, or Deep Q Network (DQN), which is considered the most modern reinforcement learning technique. We will develop a DQN model step by step and understand the importance of Experience Replay and a target network in making Deep Q-Learning work in practice. To help readers solve Atari games, we will demonstrate how to incorporate convolutional neural networks into DQNs. We will also cover two DQN variants, Double DQNs and Dueling DQNs. We will cover how to fine-tune a Q-Learning algorithm using Double DQNs as an example.

Chapter 8, Implementing Policy Gradients and Policy Optimization, focuses on policy gradients and optimization and starts by implementing the REINFORCE algorithm. We will then develop the REINFORCE algorithm with the baseline for CliffWalking. We will also implement the actor-critic algorithm and apply it to solve the CliffWalking problem. To scale up the deterministic policy gradient algorithm, we apply tricks from DQN and develop the Deep Deterministic Policy Gradients. As a bit of fun, we train an agent based on the cross-entropy method to play the CartPole game. Finally, we will talk about how to scale up policy gradient methods using the asynchronous actor-critic method and neural networks.

Chapter 9, Capstone Project – Playing Flappy Bird with DQN, takes us through a capstone project – playing Flappy Bird using reinforcement learning. We will apply what we have learned throughout this book to build an intelligent bot. We will focus on building a DQN, fine-tuning model parameters, and deploying the model. Let's see how long the bird can fly in the air.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image