Search icon CANCEL
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
S$12.99 | ALL EBOOKS & VIDEOS
Save more on purchases! Buy 2 and save 10%, Buy 3 and save 15%, Buy 5 and save 20%
TensorFlow 2 Reinforcement Learning Cookbook
TensorFlow 2 Reinforcement Learning Cookbook

TensorFlow 2 Reinforcement Learning Cookbook: Over 50 recipes to help you build, train, and deploy learning agents for real-world applications

By Palanisamy P
S$66.99 S$45.99
Book Jan 2021 472 pages 1st Edition
eBook
S$52.99 S$12.99
Print
S$66.99 S$45.99
Subscription
Free Trial
eBook
S$52.99 S$12.99
Print
S$66.99 S$45.99
Subscription
Free Trial

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Black & white paperback book shipped to your address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now
Table of content icon View table of contents Preview book icon Preview Book

TensorFlow 2 Reinforcement Learning Cookbook

Chapter 2: Implementing Value-Based, Policy-Based, and Actor-Critic Deep RL Algorithms

This chapter provides a practical approach to building value-based, policy-based, and actor-critic algorithm-based reinforcement learning (RL) agents. It includes recipes for implementing value iteration-based learning agents and breaks down the implementation details of several foundational algorithms in RL into simple steps. The policy gradient-based agent and the actor-critic agent make use of the latest major version of TensorFlow 2.x to define the neural network policies.

The following recipes will be covered in this chapter:

  • Building stochastic environments for training RL agents
  • Building value-based (RL) agent algorithms
  • Implementing temporal difference learning
  • Building Monte Carlo prediction and control algorithms for RL
  • Implementing the SARSA algorithm and an RL agent
  • Building a Q-learning agent
  • Implementing policy gradients
  • Implementing actor-critic...

Technical requirements

The code in this book has been tested extensively on Ubuntu 18.04 and Ubuntu 20.04, and should work with later versions of Ubuntu if Python 3.6+ is available. With Python 3.6 installed, along with the necessary Python packages listed at the beginning of each recipe, the code should run fine on Windows and Mac OS X too. It is advised that you create and use a Python virtual environment named tf2rl-cookbook to install the packages and run the code in this book. Installing Miniconda or Anaconda for Python virtual environment management is recommended.

The complete code for each recipe in each chapter is available here: https://github.com/PacktPublishing/Tensorflow-2-Reinforcement-Learning-Cookbook.

Building stochastic environments for training RL agents

To train RL agents for the real world, we need learning environments that are stochastic, since real-world problems are stochastic in nature. This recipe will walk you through the steps for building a Maze learning environment to train RL agents. The Maze is a simple, stochastic environment where the world is represented as a grid. Each location on the grid can be referred to as a cell. The goal of an agent in this environment is to find its way to the goal state. Consider the maze shown in the following diagram, where the black cells represent walls:

Figure 2.1 – The Maze environment

The agent's location is initialized to be at the top-left cell in the Maze. The agent needs to find its way around the grid to reach the goal located at the top-right cell in the Maze, collecting a maximum number of coins along the way while avoiding walls. The location of the goal, coins, walls, and the agent...

Building value-based reinforcement learning agent algorithms

Value-based reinforcement learning works by learning the state-value function or the action-value function in a given environment. This recipe will show you how to create and update the value function for the Maze environment to obtain an optimal policy. Learning value functions, especially in model-free RL problems where a model of the environment is not available, can prove to be quite effective, especially for RL problems with low-dimensional state space.

Upon completing this recipe, you will have an algorithm that can generate the following optimal action sequence based on value functions:

Figure 2.3 – Optimal action sequence generated by a value-based RL algorithm with state values represented through a jet color map

Let's get started.

Getting ready

To complete this recipe, you will need to activate the tf2rl-cookbook Python/conda virtual environment and run pip install numpy...

Implementing temporal difference learning

This recipe will walk you through how to implement the temporal difference (TD) learning algorithm. TD algorithms allow us to incrementally learn from incomplete episodes of agent experiences, which means they can be used for problems that require online learning capabilities. TD algorithms are useful in model-free RL settings as they do not depend on a model of the MDP transitions or rewards. To visually understand the learning progression of the TD algorithm, this recipe will also show you how to implement the GridworldV2 learning environment, which looks as follows when rendered:

Figure 2.6 – The GridworldV2 learning environment 2D rendering with state values and grid cell coordinates

Getting ready

To complete this recipe, you will need to activate the tf2rl-cookbook Python/conda virtual environment and run pip install numpy gym. If the following import statements run without issues, you are ready to get...

Building Monte Carlo prediction and control algorithms for RL

This recipe provides the ingredients for building a Monte Carlo prediction and control algorithm so that you can build your RL agents. Similar to the temporal difference learning algorithm, Monte Carlo learning methods can be used to learn both the state and the action value functions. Monte Carlo methods have zero bias since they learn from complete episodes with real experience, without approximate predictions. These methods are suitable for applications that require good convergence properties. The following diagram illustrates the value that's learned by the Monte Carlo method for the GridworldV2 environment:

Figure 2.10 – Monte Carlo prediction of state values (left) and state-action values (right)

Getting ready

To complete this recipe, you will need to activate the tf2rl-cookbook Python/conda virtual environment and run pip install -r requirements.txt. If the following import...

Implementing the SARSA algorithm and an RL agent

This recipe will show you how to implement the State-Action-Reward-State-Action (SARSA) algorithm, as well as how to develop and train an agent using the SARSA algorithm so that it can act in a reinforcement learning environment. The SARSA algorithm can be applied to model-free control problems and allows us to optimize the value function of an unknown MDP.

Upon completing this recipe, you will have a working RL agent that, when acting in the GridworldV2 environment, will generate the following state-action value function using the SARSA algorithm:

Figure 2.15 – Rendering of the GridworldV2 environment – each triangle represents the action value of taking that directional action in that grid state

Getting ready

To complete this recipe, you will need to activate the tf2rl-cookbook Python/conda virtual environment and run pip install -r requirements.txt. If the following import statements run...

Building a Q-learning agent

This recipe will show you how to build a Q-learning agent. Q-learning can be applied to model-free RL problems. It supports off-policy learning and therefore provides a practical solution to problems where available experiences were/are collected using some other policy or by some other agent (even humans).

Upon completing this recipe, you will have a working RL agent that, when acting in the GridworldV2 environment, will generate the following state-action value function using the SARSA algorithm:

Figure 2.18 – State-action values obtained using the Q-learning algorithm

Getting ready

To complete this recipe, you will need to activate the tf2rl-cookbook Python/conda virtual environment and run pip install -r requirements.txt. If the following import statements run without issues, you are ready to get started:

import numpy as np
import random

Now, let's begin.

How to do it…

Let's implement...

Implementing policy gradients

Policy gradient algorithms are fundamental to reinforcement learning and serve as the basis for several advanced RL algorithms. These algorithms directly optimize for the best policy, which can lead to faster learning compared to value-based algorithms. Policy gradient algorithms are effective for problems/applications with high-dimensional or continuous action spaces. This recipe will show you how to implement policy gradient algorithms using TensorFlow 2.0. Upon completing this recipe, you will be able to train an RL agent in any compatible OpenAI Gym environment.

Getting ready

To complete this recipe, you will need to activate the tf2rl-cookbook Python/conda virtual environment and run pip install -r requirements.txt. If the following import statements run without issues, you are ready to get started:

import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow import keras
from tensorflow.keras import layers
import numpy as...

Implementing actor-critic RL algorithms

Actor-critic algorithms allow us to combine value-based and policy-based reinforcement learning – an all-in-one agent. While policy gradient methods directly search and optimize the policy in the policy space, leading to smoother learning curves and improvement guarantees, they tend to get stuck at the local maxima (for a long-term reward optimization objective). Value-based methods do not get stuck at local optimum values, but they lack convergence guarantees, and algorithms such as Q-learning tend to have high variance and are not very sample-efficient. Actor-critic methods combine the good qualities of both value-based and policy gradient-based algorithms. Actor-critic methods are also more sample-efficient. This recipe will make it easy for you to implement an actor-critic-based RL agent using TensorFlow 2.x. Upon completing this recipe, you will be able to train the actor-critic agent in any OpenAI Gym-compatible reinforcement learning...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Develop and deploy deep reinforcement learning-based solutions to production pipelines, products, and services
  • Explore popular reinforcement learning algorithms such as Q-learning, SARSA, and the actor-critic method
  • Customize and build RL-based applications for performing real-world tasks

Description

With deep reinforcement learning, you can build intelligent agents, products, and services that can go beyond computer vision or perception to perform actions. TensorFlow 2.x is the latest major release of the most popular deep learning framework used to develop and train deep neural networks (DNNs). This book contains easy-to-follow recipes for leveraging TensorFlow 2.x to develop artificial intelligence applications. Starting with an introduction to the fundamentals of deep reinforcement learning and TensorFlow 2.x, the book covers OpenAI Gym, model-based RL, model-free RL, and how to develop basic agents. You'll discover how to implement advanced deep reinforcement learning algorithms such as actor-critic, deep deterministic policy gradients, deep-Q networks, proximal policy optimization, and deep recurrent Q-networks for training your RL agents. As you advance, you’ll explore the applications of reinforcement learning by building cryptocurrency trading agents, stock/share trading agents, and intelligent agents for automating task completion. Finally, you'll find out how to deploy deep reinforcement learning agents to the cloud and build cross-platform apps using TensorFlow 2.x. By the end of this TensorFlow book, you'll have gained a solid understanding of deep reinforcement learning algorithms and their implementations from scratch.

What you will learn

Build deep reinforcement learning agents from scratch using the all-new TensorFlow 2.x and Keras API Implement state-of-the-art deep reinforcement learning algorithms using minimal code Build, train, and package deep RL agents for cryptocurrency and stock trading Deploy RL agents to the cloud and edge to test them by creating desktop, web, and mobile apps and cloud services Speed up agent development using distributed DNN model training Explore distributed deep RL architectures and discover opportunities in AIaaS (AI as a Service)
Estimated delivery fee Deliver to Singapore

Standard delivery 10 - 13 business days

S$11.95

Premium delivery 5 - 8 business days

S$54.95
(Includes tracking information)

Product Details

Country selected

Publication date : Jan 15, 2021
Length 472 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781838982546
Vendor :
Google
Category :
Languages :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Black & white paperback book shipped to your address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now
Estimated delivery fee Deliver to Singapore

Standard delivery 10 - 13 business days

S$11.95

Premium delivery 5 - 8 business days

S$54.95
(Includes tracking information)

Product Details


Publication date : Jan 15, 2021
Length 472 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781838982546
Vendor :
Google
Category :
Languages :

Table of Contents

11 Chapters
Preface Chevron down icon Chevron up icon
1. Chapter 1: Developing Building Blocks for Deep Reinforcement Learning Using Tensorflow 2.x Chevron down icon Chevron up icon
2. Chapter 2: Implementing Value-Based, Policy-Based, and Actor-Critic Deep RL Algorithms Chevron down icon Chevron up icon
3. Chapter 3: Implementing Advanced RL Algorithms Chevron down icon Chevron up icon
4. Chapter 4: Reinforcement Learning in the Real World – Building Cryptocurrency Trading Agents Chevron down icon Chevron up icon
5. Chapter 5: Reinforcement Learning in the Real World – Building Stock/Share Trading Agents Chevron down icon Chevron up icon
6. Chapter 6: Reinforcement Learning in the Real World – Building Intelligent Agents to Complete Your To-Dos Chevron down icon Chevron up icon
7. Chapter 7: Deploying Deep RL Agents to the Cloud Chevron down icon Chevron up icon
8. Chapter 8: Distributed Training for Accelerated Development of Deep RL Agents Chevron down icon Chevron up icon
9. Chapter 9: Deploying Deep RL Agents on Multiple Platforms Chevron down icon Chevron up icon
10. Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Top Reviews
No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela