Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Deep Reinforcement Learning with Python
Deep Reinforcement Learning with Python

Deep Reinforcement Learning with Python: Master classic RL, deep RL, distributional RL, inverse RL, and more with OpenAI Gym and TensorFlow , Second Edition

eBook
€20.98 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Table of content icon View table of contents Preview book icon Preview Book

Deep Reinforcement Learning with Python

A Guide to the Gym Toolkit

OpenAI is an artificial intelligence (AI) research organization that aims to build artificial general intelligence (AGI). OpenAI provides a famous toolkit called Gym for training a reinforcement learning agent.

Let's suppose we need to train our agent to drive a car. We need an environment to train the agent. Can we train our agent in the real-world environment to drive a car? No, because we have learned that reinforcement learning (RL) is a trial-and-error learning process, so while we train our agent, it will make a lot of mistakes during learning. For example, let's suppose our agent hits another vehicle, and it receives a negative reward. It will then learn that hitting other vehicles is not a good action and will try not to perform this action again. But we cannot train the RL agent in the real-world environment by hitting other vehicles, right? That is why we use simulators and train the RL agent in the simulated environments.

There are many toolkits that provide a simulated environment for training an RL agent. One such popular toolkit is Gym. Gym provides a variety of environments for training an RL agent ranging from classic control tasks to Atari game environments. We can train our RL agent to learn in these simulated environments using various RL algorithms. In this chapter, first, we will install Gym and then we will explore various Gym environments. We will also get hands-on with the concepts we have learned in the previous chapter by experimenting with the Gym environment.

Throughout the book, we will use the Gym toolkit for building and evaluating reinforcement learning algorithms, so in this chapter, we will make ourselves familiar with the Gym toolkit.

In this chapter, we will learn about the following topics:

  • Setting up our machine
  • Installing Anaconda and Gym
  • Understanding the Gym environment
  • Generating an episode in the Gym environment
  • Exploring more Gym environments
  • Cart-Pole balancing with the random agent
  • An agent playing the Tennis game

Setting up our machine

In this section, we will learn how to install several dependencies that are required for running the code used throughout the book. First, we will learn how to install Anaconda and then we will explore how to install Gym.

Installing Anaconda

Anaconda is an open-source distribution of Python. It is widely used for scientific computing and processing large volumes of data. It provides an excellent package management environment, and it supports Windows, Mac, and Linux operating systems. Anaconda comes with Python installed, along with popular packages used for scientific computing such as NumPy, SciPy, and so on.

To download Anaconda, visit https://www.anaconda.com/download/, where you will see an option for downloading Anaconda for different platforms. If you are using Windows or macOS, you can directly download the graphical installer according to your machine architecture and install Anaconda using the graphical installer.

If you are using Linux, follow these steps:

  1. Open the Terminal and type the following command to download Anaconda:
    wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh
    
  2. After downloading, we can install Anaconda using the following command:
    bash Anaconda3-5.0.1-Linux-x86_64.sh
    

After the successful installation of Anaconda, we need to create a virtual environment. What is the need for a virtual environment? Say we are working on project A, which uses NumPy version 1.14, and project B, which uses NumPy version 1.13. So, to work on project B we either downgrade NumPy or reinstall NumPy. In each project, we use different libraries with different versions that are not applicable to the other projects. Instead of downgrading or upgrading versions or reinstalling libraries every time for a new project, we use a virtual environment.

The virtual environment is just an isolated environment for a particular project so that each project can have its own dependencies and will not affect other projects. We will create a virtual environment using the following command and name our environment universe:

conda create --name universe python=3.6 anaconda

Note that we use Python version 3.6. Once the virtual environment is created, we can activate it using the following command:

source activate universe

That's it! Now that we have learned how to install Anaconda and create a virtual environment, in the next section, we will learn how to install Gym.

Installing the Gym toolkit

In this section, we will learn how to install the Gym toolkit. Before going ahead, first, let's activate our virtual environment, universe:

source activate universe

Now, install the following dependencies:

sudo apt-get update
sudo apt-get install golang libcupti-dev libjpeg-turbo8-dev make tmux htop chromium-browser git cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig
conda install pip six libgcc swig
conda install opencv

We can install Gym directly using pip. Note that throughout the book, we will use Gym version 0.15.4. We can install Gym using the following command:

pip install gym==0.15.4

We can also install Gym by cloning the Gym repository as follows:

cd ~
git clone https://github.com/openai/gym.git
cd gym
pip install -e '.[all]'

Common error fixes

Just in case, if you get any of the following errors while installing Gym, the following commands will help:

  • Failed building wheel for pachi-py or failed building wheel for pachi-py atari-py:
    sudo apt-get update
    sudo apt-get install xvfb libav-tools xorg-dev libsdl2-dev swig cmake
    
  • Failed building wheel for mujoco-py:
    git clone https://github.com/openai/mujoco-py.git
    cd mujoco-py
    sudo apt-get update
    sudo apt-get install libgl1-mesa-dev libgl1-mesa-glx libosmesa6-dev python3-pip python3-numpy python3-scipy
    pip3 install -r requirements.txt
    sudo python3 setup.py install
    
  • error: command 'gcc' failed with exit status 1:
    sudo apt-get update
    sudo apt-get install python-dev 
    sudo apt-get install libevent-dev
    

Now that we have successfully installed Gym, in the next section, let's kickstart our hands-on reinforcement learning journey.

Creating our first Gym environment

We have learned that Gym provides a variety of environments for training a reinforcement learning agent. To clearly understand how the Gym environment is designed, we will start with the basic Gym environment. After that, we will understand other complex Gym environments.

Let's introduce one of the simplest environments called the Frozen Lake environment. Figure 2.1 shows the Frozen Lake environment. As we can observe, in the Frozen Lake environment, the goal of the agent is to start from the initial state S and reach the goal state G:

Figure 2.1: The Frozen Lake environment

In the preceding environment, the following apply:

  • S denotes the starting state
  • F denotes the frozen state
  • H denotes the hole state
  • G denotes the goal state

So, the agent has to start from state S and reach the goal state G. But one issue is that if the agent visits state H, which is the hole state, then the agent will fall into the hole and die as shown in Figure 2.2:

Figure 2.2: The agent falls down a hole

So, we need to make sure that the agent starts from S and reaches G without falling into the hole state H as shown in Figure 2.3:

Figure 2.3: The agent reaches the goal state

Each grid box in the preceding environment is called a state, thus we have 16 states (S to G) and we have 4 possible actions, which are up, down, left, and right. We learned that our goal is to reach the state G from S without visiting H. So, we assign +1 reward for the goal state G and 0 for all other states.

Thus, we have learned how the Frozen Lake environment works. Now, to train our agent in the Frozen Lake environment, first, we need to create the environment by coding it from scratch in Python. But luckily we don't have to do that! Since Gym provides various environments, we can directly import the Gym toolkit and create a Frozen Lake environment.

Now, we will learn how to create our Frozen Lake environment using Gym. Before running any code, make sure that you have activated our virtual environment universe. First, let's import the Gym library:

import gym

Next, we can create a Gym environment using the make function. The make function requires the environment id as a parameter. In Gym, the id of the Frozen Lake environment is FrozenLake-v0. So, we can create our Frozen Lake environment as follows:

env = gym.make("FrozenLake-v0")

After creating the environment, we can see how our environment looks like using the render function:

env.render()

The preceding code renders the following environment:

Figure 2.4: Gym's Frozen Lake environment

As we can observe, the Frozen Lake environment consists of 16 states (S to G) as we learned. The state S is highlighted indicating that it is our current state, that is, the agent is in the state S. So whenever we create an environment, an agent will always begin from the initial state, which in our case is state S.

That's it! Creating the environment using Gym is that simple. In the next section, we will understand more about the Gym environment by relating all the concepts we have learned in the previous chapter.

Exploring the environment

In the previous chapter, we learned that the reinforcement learning environment can be modeled as a Markov decision process (MDP) and an MDP consists of the following:

  • States: A set of states present in the environment.
  • Actions: A set of actions that the agent can perform in each state.
  • Transition probability: The transition probability is denoted by . It implies the probability of moving from a state s to the state while performing an action a.
  • Reward function: The reward function is denoted by . It implies the reward the agent obtains moving from a state s to the state while performing an action a.

Let's now understand how to obtain all the above information from the Frozen Lake environment we just created using Gym.

States

A state space consists of all of our states. We can obtain the number of states in our environment by just typing env.observation_space as follows:

print(env.observation_space)

The preceding code will print:

Discrete(16)

It implies that we have 16 discrete states in our state space starting from state S to G. Note that, in Gym, the states will be encoded as a number, so the state S will be encoded as 0, state F will be encoded as 1, and so on as Figure 2.5 shows:

Figure 2.5: Sixteen discrete states

Actions

We learned that the action space consists of all the possible actions in the environment. We can obtain the action space by using env.action_space:

print(env.action_space)

The preceding code will print:

Discrete(4)

It shows that we have 4 discrete actions in our action space, which are left, down, right, and up. Note that, similar to states, actions also will be encoded into numbers as shown in Table 2.1:

Table 2.1: Four discrete actions

Transition probability and reward function

Now, let's look at how to obtain the transition probability and the reward function. We learned that in the stochastic environment, we cannot say that by performing some action a, the agent will always reach the next state exactly because there will be some randomness associated with the stochastic environment, and by performing an action a in the state s, the agent reaches the next state with some probability.

Let's suppose we are in state 2 (F). Now, if we perform action 1 (down) in state 2, we can reach state 6 as shown in Figure 2.6:

Figure 2.6: The agent performing a down action from state 2

Our Frozen Lake environment is a stochastic environment. When our environment is stochastic, we won't always reach state 6 by performing action 1 (down) in state 2; we also reach other states with some probability. So when we perform an action 1 (down) in state 2, we reach state 1 with probability 0.33333, we reach state 6 with probability 0.33333, and we reach state 3 with probability 0.33333 as shown in Figure 2.7:

Figure 2.7: Transition probability of the agent in state 2

As we can see, in a stochastic environment we reach the next states with some probability. Now, let's learn how to obtain this transition probability using the Gym environment.

We can obtain the transition probability and the reward function by just typing env.P[state][action]. So, to obtain the transition probability of moving from state S to the other states by performing the action right, we can type env.P[S][right]. But we cannot just type state S and action right directly since they are encoded as numbers. We learned that state S is encoded as 0 and the action right is encoded as 2, so, to obtain the transition probability of state S by performing the action right, we type env.P[0][2] as the following shows:

print(env.P[0][2])

The above code will print:

[(0.33333, 4, 0.0, False),
 (0.33333, 1, 0.0, False),
 (0.33333, 0, 0.0, False)]

What does this imply? Our output is in the form of [(transition probability, next state, reward, Is terminal state?)]. It implies that if we perform an action 2 (right) in state 0 (S) then:

  • We reach state 4 (F) with probability 0.33333 and receive 0 reward.
  • We reach state 1 (F) with probability 0.33333 and receive 0 reward.
  • We reach the same state 0 (S) with probability 0.33333 and receive 0 reward.

Figure 2.8 shows the transition probability:

Figure 2.8: Transition probability of the agent in state 0

Thus, when we type env.P[state][action], we get the result in the form of [(transition probability, next state, reward, Is terminal state?)]. The last value is Boolean and tells us whether the next state is a terminal state. Since 4, 1, and 0 are not terminal states, it is given as false.

The output of env.P[0][2] is shown in Table 2.2 for more clarity:

Table 2.2: Output of env.P[0][2]

Let's understand this with one more example. Let's suppose we are in state 3 (F) as Figure 2.9 shows:

Figure 2.9: The agent in state 3

Say we perform action 1 (down) in state 3 (F). Then the transition probability of state 3 (F) by performing action 1 (down) can be obtained as the following shows:

print(env.P[3][1])

The preceding code will print:

[(0.33333, 2, 0.0, False),
 (0.33333, 7, 0.0, True),
 (0.33333, 3, 0.0, False)]

As we learned, our output is in the form of [(transition probability, next state, reward, Is terminal state?)]. It implies that if we perform action 1 (down) in state 3 (F) then:

  • We reach state 2 (F) with probability 0.33333 and receive 0 reward.
  • We reach state 7 (H) with probability 0.33333 and receive 0 reward.
  • We reach the same state 3 (F) with probability 0.33333 and receive 0 reward.

Figure 2.10 shows the transition probability:

Figure 2.10: Transition probabilities of the agent in state 3

The output of env.P[3][1] is shown in Table 2.3 for more clarity:

Table 2.3: Output of env.P[3][1]

As we can observe, in the second row of our output, we have (0.33333, 7, 0.0, True), and the last value here is marked as True. It implies that state 7 is a terminal state. That is, if we perform action 1 (down) in state 3 (F) then we reach state 7 (H) with 0.33333 probability, and since 7 (H) is a hole, the agent dies if it reaches state 7 (H). Thus 7(H) is a terminal state and so it is marked as True.

Thus, we have learned how to obtain the state space, action space, transition probability, and the reward function using the Gym environment. In the next section, we will learn how to generate an episode.

Generating an episode in the Gym environment

We learned that the agent-environment interaction starting from an initial state until the terminal state is called an episode. In this section, we will learn how to generate an episode in the Gym environment.

Before we begin, we initialize the state by resetting our environment; resetting puts our agent back to the initial state. We can reset our environment using the reset() function as shown as follows:

state = env.reset()

Action selection

In order for the agent to interact with the environment, it has to perform some action in the environment. So, first, let's learn how to perform an action in the Gym environment. Let's suppose we are in state 3 (F) as Figure 2.11 shows:

Figure 2.11: The agent is in state 3 in the Frozen Lake environment

Say we need to perform action 1 (down) and move to the new state 7 (H). How can we do that? We can perform an action using the step function. We just need to input our action as a parameter to the step function. So, we can perform action 1 (down) in state 3 (F) using the step function as follows:

env.step(1)

Now, let's render our environment using the render function:

env.render()

As shown in Figure 2.12, the agent performs action 1 (down) in state 3 (F) and reaches the next state 7 (H):

Figure 2.12: The agent in state 7 in the Frozen Lake environment

Note that whenever we make an action using env.step(), it outputs a tuple containing 4 values. So, when we take action 1 (down) in state 3 (F) using env.step(1), it gives the output as:

(7, 0.0, True, {'prob': 0.33333})

As you might have guessed, it implies that when we perform action 1 (down) in state 3 (F):

  • We reach the next state 7 (H).
  • The agent receives the reward 0.0.
  • Since the next state 7 (H) is a terminal state, it is marked as True.
  • We reach the next state 7 (H) with a probability of 0.33333.

So, we can just store this information as:

(next_state, reward, done, info) = env.step(1)

Thus:

  • next_state represents the next state.
  • reward represents the obtained reward.
  • done implies whether our episode has ended. That is, if the next state is a terminal state, then our episode will end, so done will be marked as True else it will be marked as False.
  • info—Apart from the transition probability, in some cases, we also obtain other information saved as info, which is used for debugging purposes.

We can also sample action from our action space and perform a random action to explore our environment. We can sample an action using the sample function:

random_action = env.action_space.sample()

After we have sampled an action from our action space, then we perform our sampled action using our step function:

next_state, reward, done, info = env.step(random_action)

Now that we have learned how to select actions in the environment, let's see how to generate an episode.

Generating an episode

Now let's learn how to generate an episode. The episode is the agent environment interaction starting from the initial state to the terminal state. The agent interacts with the environment by performing some action in each state. An episode ends if the agent reaches the terminal state. So, in the Frozen Lake environment, the episode will end if the agent reaches the terminal state, which is either the hole state (H) or goal state (G).

Let's understand how to generate an episode with the random policy. We learned that the random policy selects a random action in each state. So, we will generate an episode by taking random actions in each state. So for each time step in the episode, we take a random action in each state and our episode will end if the agent reaches the terminal state.

First, let's set the number of time steps:

num_timesteps = 20

For each time step:

for t in range(num_timesteps):

Randomly select an action by sampling from the action space:

    random_action = env.action_space.sample()

Perform the selected action:

    next_state, reward, done, info = env.step(random_action)

If the next state is the terminal state, then break. This implies that our episode ends:

    if done:
        break

The preceding complete snippet is provided for clarity. The following code denotes that on every time step, we select an action by randomly sampling from the action space, and our episode will end if the agent reaches the terminal state:

import gym
env = gym.make("FrozenLake-v0")
state = env.reset()
print('Time Step 0 :')
env.render()
num_timesteps = 20
for t in range(num_timesteps):
  random_action = env.action_space.sample()
  new_state, reward, done, info = env.step(random_action)
  print ('Time Step {} :'.format(t+1))
  env.render()
  if done:
    break

The preceding code will print something similar to Figure 2.13. Note that you might get a different result each time you run the preceding code since the agent is taking a random action in each time step.

As we can observe from the following output, on each time step, the agent takes a random action in each state and our episode ends once the agent reaches the terminal state. As Figure 2.13 shows, in time step 4, the agent reaches the terminal state H, and so the episode ends:

Figure 2.13: Actions taken by the agent in each time step

Instead of generating one episode, we can also generate a series of episodes by taking some random action in each state:

import gym
env = gym.make("FrozenLake-v0")
num_episodes = 10
num_timesteps = 20 
for i in range(num_episodes):
    
    state = env.reset()
    print('Time Step 0 :')
    env.render()
    for t in range(num_timesteps):
        random_action = env.action_space.sample()
        
        new_state, reward, done, info = env.step(random_action)
        print ('Time Step {} :'.format(t+1))
        env.render()
        if done:
            break

Thus, we can generate an episode by selecting a random action in each state by sampling from the action space. But wait! What is the use of this? Why do we even need to generate an episode?

In the previous chapter, we learned that an agent can find the optimal policy (that is, the correct action in each state) by generating several episodes. But in the preceding example, we just took random actions in each state over all the episodes. How can the agent find the optimal policy? So, in the case of the Frozen Lake environment, how can the agent find the optimal policy that tells the agent to reach state G from state S without visiting the hole states H?

This is where we need a reinforcement learning algorithm. Reinforcement learning is all about finding the optimal policy, that is, the policy that tells us what action to perform in each state. We will learn how to find the optimal policy by generating a series of episodes using various reinforcement learning algorithms in the upcoming chapters. In this chapter, we will focus on getting acquainted with the Gym environment and various Gym functionalities as we will be using the Gym environment throughout the course of the book.

So far we have understood how the Gym environment works using the basic Frozen Lake environment, but Gym has so many other functionalities and also several interesting environments. In the next section, we will learn about the other Gym environments along with exploring the functionalities of Gym.

More Gym environments

In this section, we will explore several interesting Gym environments, along with exploring different functionalities of Gym.

Classic control environments

Gym provides environments for several classic control tasks such as Cart-Pole balancing, swinging up an inverted pendulum, mountain car climbing, and so on. Let's understand how to create a Gym environment for a Cart-Pole balancing task. The Cart-Pole environment is shown below:

Figure 2.14: Cart-Pole example

Cart-Pole balancing is one of the classical control problems. As shown in Figure 2.14, the pole is attached to the cart and the goal of our agent is to balance the pole on the cart, that is, the goal of our agent is to keep the pole standing straight up on the cart as shown in Figure 2.15:

Figure 2.15: The goal is to keep the pole straight up

So the agent tries to push the cart left and right to keep the pole standing straight on the cart. Thus our agent performs two actions, which are pushing the cart to the left and pushing the cart to the right, to keep the pole standing straight on the cart. You can also check out this very interesting video, https://youtu.be/qMlcsc43-lg, which shows how the RL agent balances the pole on the cart by moving the cart left and right.

Now, let's learn how to create the Cart-Pole environment using Gym. The environment id of the Cart-Pole environment in Gym is CartPole-v0, so we can just use our make function to create the Cart-Pole environment as shown below:

env = gym.make("CartPole-v0")

After creating the environment, we can view our environment using the render function:

env.render()

We can also close the rendered environment using the close function:

env.close()

State space

Now, let's look at the state space of our Cart-Pole environment. Wait! What are the states here? In the Frozen Lake environment, we had 16 discrete states from S to G. But how can we describe the states here? Can we describe the state by cart position? Yes! Note that the cart position is a continuous value. So, in this case, our state space will be continuous values, unlike the Frozen Lake environment where our state space had discrete values (S to G).

But with just the cart position alone we cannot describe the state of the environment completely. So we include cart velocity, pole angle, and pole velocity at the tip. So we can describe our state space by an array of values as shown as follows:

array([cart position, cart velocity, pole angle, pole velocity at the tip])

Note that all of these values are continuous, that is:

  1. The value of the cart position ranges from -4.8 to 4.8.
  2. The value of the cart velocity ranges from -Inf to Inf ( to ).
  3. The value of the pole angle ranges from -0.418 radians to 0.418 radians.
  4. The value of the pole velocity at the tip ranges from -Inf to Inf.

Thus, our state space contains an array of continuous values. Let's learn how we can obtain this from Gym. In order to get the state space, we can just type env.observation_space as shown as follows:

print(env.observation_space)

The preceding code will print:

Box(4,)

Box implies that our state space consists of continuous values and not discrete values. That is, in the Frozen Lake environment, we obtained the state space as Discrete(16), which shows that we have 16 discrete states (S to G). But now we have our state space denoted as Box(4,), which implies that our state space is continuous and consists of an array of 4 values.

For example, let's reset our environment and see how our initial state space will look like. We can reset the environment using the reset function:

print(env.reset())

The preceding code will print:

array([ 0.02002635, -0.0228838 ,  0.01248453,  0.04931007])

Note that here the state space is randomly initialized and so we will get different values every time we run the preceding code.

The result of the preceding code implies that our initial state space consists of an array of 4 values that denote the cart position, cart velocity, pole angle, and pole velocity at the tip, respectively. That is:

Figure 2.16: Initial state space

Okay, how can we obtain the maximum and minimum values of our state space? We can obtain the maximum values of our state space using env.observation_space.high and the minimum values of our state space using env.observation_space.low.

For example, let's look at the maximum value of our state space:

print(env.observation_space.high)

The preceding code will print:

[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]

It implies that:

  1. The maximum value of the cart position is 4.8.
  2. We learned that the maximum value of the cart velocity is +Inf, and we know that infinity is not really a number, so it is represented using the largest positive real value 3.4028235e+38.
  3. The maximum value of the pole angle is 0.418 radians.
  4. The maximum value of the pole velocity at the tip is +Inf, so it is represented using the largest positive real value 3.4028235e+38.

Similarly, we can obtain the minimum value of our state space as:

print(env.observation_space.low)

The preceding code will print:

[-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]

It states that:

  1. The minimum value of the cart position is -4.8.
  2. We learned that the minimum value of the cart velocity is -Inf, and we know that infinity is not really a number, so it is represented using the largest negative real value -3.4028235e+38.
  3. The minimum value of the pole angle is -0.418 radians.
  4. The minimum value of the pole velocity at the tip is -Inf, so it is represented using the largest negative real value -3.4028235e+38.

Action space

Now, let's look at the action space. We already learned that in the Cart-Pole environment we perform two actions, which are pushing the cart to the left and pushing the cart to the right, and thus the action space is discrete since we have only two discrete actions.

In order to get the action space, we can just type env.action_space as the following shows:

print(env.action_space)

The preceding code will print:

Discrete(2)

As we can observe, Discrete(2) implies that our action space is discrete, and we have two actions in our action space. Note that the actions will be encoded into numbers as shown in Table 2.4:

Table 2.4: Two possible actions

Cart-Pole balancing with random policy

Let's create an agent with the random policy, that is, we create the agent that selects a random action in the environment and tries to balance the pole. The agent receives a +1 reward every time the pole stands straight up on the cart. We will generate over 100 episodes, and we will see the return (sum of rewards) obtained over each episode. Let's learn this step by step.

First, let's create our Cart-Pole environment:

import gym
env = gym.make('CartPole-v0')

Set the number of episodes and number of time steps in the episode:

num_episodes = 100
num_timesteps = 50

For each episode:

for i in range(num_episodes):

Set the return to 0:

    Return = 0

Initialize the state by resetting the environment:

    state = env.reset()

For each step in the episode:

    for t in range(num_timesteps):

Render the environment:

        env.render()

Randomly select an action by sampling from the environment:

        random_action = env.action_space.sample()

Perform the randomly selected action:

        next_state, reward, done, info = env.step(random_action)

Update the return:

        Return = Return + reward

If the next state is a terminal state then end the episode:

        if done:
            break

For every 10 episodes, print the return (sum of rewards):

    if i%10==0:
        print('Episode: {}, Return: {}'.format(i, Return))
        

Close the environment:

env.close()

The preceding code will output the sum of rewards obtained over every 10 episodes:

Episode: 0, Return: 14.0
Episode: 10, Return: 31.0
Episode: 20, Return: 16.0
Episode: 30, Return: 9.0
Episode: 40, Return: 18.0
Episode: 50, Return: 13.0
Episode: 60, Return: 25.0
Episode: 70, Return: 21.0
Episode: 80, Return: 17.0
Episode: 90, Return: 14.0

Thus, we have learned about one of the interesting and classic control problems called Cart-Pole balancing and how to create the Cart-Pole balancing environment using Gym. Gym provides several other classic control environments as shown in Figure 2.17:

Figure 2.17: Classic control environments

You can also do some experimentation by creating any of the above environments using Gym. We can check all the classic control environments offered by Gym here: https://gym.openai.com/envs/#classic_control.

Atari game environments

Are you a fan of Atari games? If yes, then this section will interest you. Atari 2600 is a video game console from a game company called Atari. The Atari game console provides several popular games, which include Pong, Space Invaders, Ms. Pac-Man, Break Out, Centipede, and many more. Training our reinforcement learning agent to play Atari games is an interesting as well as challenging task. Often, most of the RL algorithms will be tested out on Atari game environments to evaluate the accuracy of the algorithm.

In this section, we will learn how to create the Atari game environment using Gym. Gym provides about 59 Atari game environments including Pong, Space Invaders, Air Raid, Asteroids, Centipede, Ms. Pac-Man, and so on. Some of the Atari game environments provided by Gym are shown in Figure 2.18 to keep you excited:

Figure 2.18: Atari game environments

In Gym, every Atari game environment has 12 different variants. Let's understand this with the Pong game environment. The Pong game environment will have 12 different variants as explained in the following sections.

General environment

  • Pong-v0 and Pong-v4: We can create a Pong environment with the environment id as Pong-v0 or Pong-v4. Okay, what about the state of our environment? Since we are dealing with the game environment, we can just take the image of our game screen as our state. But we can't deal with the raw image directly so we will take the pixel values of our game screen as the state. We will learn more about this in the upcoming section.
  • Pong-ram-v0 and Pong-ram-v4: This is similar to Pong-v0 and Pong-v4, respectively. However, here, the state of the environment is the RAM of the Atari machine, which is just the 128 bytes instead of the game screen's pixel values.

Deterministic environment

  • PongDeterministic-v0 and PongDeterministic-v4: In this type, as the name suggests, the initial position of the game will be the same every time we initialize the environment, and the state of the environment is the pixel values of the game screen.
  • Pong-ramDeterministic-v0 and Pong-ramDeterministic-v4: This is similar to PongDeterministic-v0 and PongDeterministic-v4, respectively, but here the state is the RAM of the Atari machine.

No frame skipping

  • PongNoFrameskip-v0 and PongNoFrameskip-v4: In this type, no game frame is skipped; all game screens are visible to the agent and the state is the pixel value of the game screen.
  • Pong-ramNoFrameskip-v0 and Pong-ramNoFrameskip-v4: This is similar to PongNoFrameskip-v0 and PongNoFrameskip-v4, but here the state is the RAM of the Atari machine.

Thus in the Atari environment, the state of our environment will be either the game screen or the RAM of the Atari machine. Note that similar to the Pong game, all other Atari games have the id in the same fashion in the Gym environment. For example, suppose we want to create a deterministic Space Invaders environment; then we can just create it with the id SpaceInvadersDeterministic-v0. Say we want to create a Space Invaders environment with no frame skipping; then we can create it with the id SpaceInvadersNoFrameskip-v0.

We can check out all the Atari game environments offered by Gym here: https://gym.openai.com/envs/#atari.

State and action space

Now, let's explore the state space and action space of the Atari game environments in detail.

State space

In this section, let's understand the state space of the Atari games in the Gym environment. Let's learn this with the Pong game. We learned that in the Atari environment, the state of the environment will be either the game screen's pixel values or the RAM of the Atari machine. First, let's understand the state space where the state of the environment is the game screen's pixel values.

Let's create a Pong environment with the make function:

env = gym.make("Pong-v0")

Here, the game screen is the state of our environment. So, we will just take the image of the game screen as the state. However, we can't deal with the raw images directly, so we will take the pixel values of the image (game screen) as our state. The dimension of the image pixel will be 3 containing the image height, image width, and the number of the channel.

Thus, the state of our environment will be an array containing the pixel values of the game screen:

 [Image height, image width, number of the channel]

Note that the pixel values range from 0 to 255. In order to get the state space, we can just type env.observation_space as the following shows:

print(env.observation_space)

The preceding code will print:

Box(210, 160, 3)

This indicates that our state space is a 3D array with a shape of [210,160,3]. As we've learned, 210 denotes the height of the image, 160 denotes the width of the image, and 3 represents the number of channels.

For example, we can reset our environment and see how the initial state space looks like. We can reset the environment using the reset function:

print(env.reset())

The preceding code will print an array representing the initial game screen's pixel value.

Now, let's create a Pong environment where the state of our environment is the RAM of the Atari machine instead of the game screen's pixel value:

env = gym.make("Pong-ram-v0")

Let's look at the state space:

print(env.observation_space)

The preceding code will print:

Box(128,)

This implies that our state space is a 1D array containing 128 values. We can reset our environment and see how the initial state space looks like:

print(env.reset())

Note that this applies to all Atari games in the Gym environment, for example, if we create a space invaders environment with the state of our environment as the game screen's pixel value, then our state space will be a 3D array with a shape of Box(210, 160, 3). However, if we create the Space Invaders environment with the state of our environment as the RAM of Atari machine, then our state space will be an array with a shape of Box(128,).

Action space

Let's now explore the action space. In general, the Atari game environment has 18 actions in the action space, and the actions are encoded from 0 to 17 as shown in Table 2.5:

Table 2.5: Atari game environment actions

Note that all the preceding 18 actions are not applicable to all the Atari game environments and the action space varies from game to game. For instance, some games use only the first six of the preceding actions as their action space, and some games use only the first nine of the preceding actions as their action space, while others use all of the preceding 18 actions. Let's understand this with an example using the Pong game:

env = gym.make("Pong-v0")
print(env.action_space)

The preceding code will print:

Discrete(6)

The code shows that we have 6 actions in the Pong action space, and the actions are encoded from 0 to 5. So the possible actions in the Pong game are noop (no action), fire, up, right, left, and down.

Let's now look at the action space of the Road Runner game. Just in case you have not come across this game before, the game screen looks like this:

Figure 2.19: The Road Runner environment

Let's see the action space of the Road Runner game:

env = gym.make("RoadRunner-v0")
print(env.action_space)

The preceding code will print:

Discrete(18)

This shows us that the action space in the Road Runner game includes all 18 actions.

An agent playing the Tennis game

In this section, let's explore how to create an agent to play the Tennis game. Let's create an agent with a random policy, meaning that the agent will select an action randomly from the action space and perform the randomly selected action.

First, let's create our Tennis environment:

import gym
env = gym.make('Tennis-v0')

Let's view the Tennis environment:

env.render()

The preceding code will display the following:

Figure 2.20: The Tennis game environment

Set the number of episodes and the number of time steps in the episode:

num_episodes = 100
num_timesteps = 50

For each episode:

for i in range(num_episodes):

Set the return to 0:

    Return = 0

Initialize the state by resetting the environment:

    state = env.reset()

For each step in the episode:

    for t in range(num_timesteps):

Render the environment:

        env.render()

Randomly select an action by sampling from the environment:

        random_action = env.action_space.sample()

Perform the randomly selected action:

        next_state, reward, done, info = env.step(random_action)

Update the return:

        Return = Return + reward

If the next state is a terminal state, then end the episode:

        if done:
            break

For every 10 episodes, print the return (sum of rewards):

    if i%10==0:
        print('Episode: {}, Return: {}'.format(i, Return))
        

Close the environment:

env.close()

The preceding code will output the return (sum of rewards) obtained over every 10 episodes:

Episode: 0, Return: -1.0
Episode: 10, Return: -1.0
Episode: 20, Return: 0.0
Episode: 30, Return: -1.0
Episode: 40, Return: -1.0
Episode: 50, Return: -1.0
Episode: 60, Return: 0.0
Episode: 70, Return: 0.0
Episode: 80, Return: -1.0
Episode: 90, Return: 0.0

Recording the game

We have just learned how to create an agent that randomly selects an action from the action space and plays the Tennis game. Can we also record the game played by the agent and save it as a video? Yes! Gym provides a wrapper class, which we can use to save the agent's gameplay as video.

To record the game, our system should support FFmpeg. FFmpeg is a framework used for processing media files. So before moving ahead, make sure that your system provides FFmpeg support.

We can record our game using the Monitor wrapper as the following code shows. It takes three parameters: the environment; the directory where we want to save our recordings; and the force option. If we set force = False, it implies that we need to create a new directory every time we want to save new recordings, and when we set force = True, old recordings in the directory will be cleared out and replaced by new recordings:

env = gym.wrappers.Monitor(env, 'recording', force=True)

We just need to add the preceding line of code after creating our environment. Let's take a simple example and see how the recordings work. Let's make our agent randomly play the Tennis game for a single episode and record the agent's gameplay as a video:

import gym
env = gym.make('Tennis-v0')
#Record the game
env = gym.wrappers.Monitor(env, 'recording', force=True)
env.reset()
for _ in range(5000):
    env.render()
    action = env.action_space.sample() 
    next_state, reward, done, info = env.step(action)
    if done:
        break
env.close()

Once the episode ends, we will see a new directory called recording and we can find the video file in MP4 format in this directory, which has our agent's gameplay as shown in Figure 2.21:

Figure 2.21: The Tennis gameplay

Other environments

Apart from the classic control and the Atari game environments we've discussed, Gym also provides several different categories of the environment. Let's find out more about them.

Box2D

Box2D is the 2D simulator that is majorly used for training our agent to perform continuous control tasks, such as walking. For example, Gym provides a Box2D environment called BipedalWalker-v2, which we can use to train our agent to walk. The BipedalWalker-v2 environment is shown in Figure 2.22:

Figure 2.22: The Bipedal Walker environment

We can check out several other Box2D environments offered by Gym here: https://gym.openai.com/envs/#box2d.

MuJoCo

Mujoco stands for Multi-Joint dynamics with Contact and is one of the most popular simulators used for training our agent to perform continuous control tasks. For example, MuJoCo provides an interesting environment called HumanoidStandup-v2, which we can use to train our agent to stand up. The HumanoidStandup-v2 environment is shown in Figure 2.23:

Figure 2.23: The Humanoid Stand Up environment

We can check out several other Mujoco environments offered by Gym here: https://gym.openai.com/envs/#mujoco.

Robotics

Gym provides several environments for performing goal-based tasks for the fetch and shadow hand robots. For example, Gym provides an environment called HandManipulateBlock-v0, which we can use to train our agent to orient a box using a robotic hand. The HandManipulateBlock-v0 environment is shown in Figure 2.24:

Figure 2.24: The Hand Manipulate Block environment

We can check out the several robotics environments offered by Gym here: https://gym.openai.com/envs/#robotics.

Toy text

Toy text is the simplest text-based environment. We already learned about one such environment at the beginning of this chapter, which is the Frozen Lake environment. We can check out other interesting toy text environments offered by Gym here: https://gym.openai.com/envs/#toy_text.

Algorithms

Instead of using our RL agent to play games, can we make use of our agent to solve some interesting problems? Yes! The algorithmic environment provides several interesting problems like copying a given sequence, performing addition, and so on. We can make use of the RL agent to solve these problems by learning how to perform computation. For instance, Gym provides an environment called ReversedAddition-v0, which we can use to train our agent to add multiple digit numbers.

We can check the algorithmic environments offered by Gym here: https://gym.openai.com/envs/#algorithmic.

Environment synopsis

We have learned about several types of Gym environment. Wouldn't it be nice if we could have information about all the environments in a single place? Yes! The Gym wiki provides a description of all the environments with their environment id, state space, action space, and reward range in a table: https://github.com/openai/gym/wiki/Table-of-environments.

We can also check all the available environments in Gym using the registry.all() method:

from gym import envs
print(envs.registry.all())

The preceding code will print all the available environments in Gym.

Thus, in this chapter, we have learned about the Gym toolkit and also several interesting environments offered by Gym. In the upcoming chapters, we will learn how to train our RL agent in a Gym environment to find the optimal policy.

Summary

We started the chapter by understanding how to set up our machine by installing Anaconda and the Gym toolkit. We learned how to create a Gym environment using the gym.make() function. Later, we also explored how to obtain the state space of the environment using env.observation_space and the action space of the environment using env.action_space. We then learned how to obtain the transition probability and reward function of the environment using env.P. Following this, we also learned how to generate an episode using the Gym environment. We understood that in each step of the episode we select an action using the env.step() function.

We understood the classic control methods in the Gym environment. We learned about the continuous state space of the classic control environments and how they are stored in an array. We also learned how to balance a pole using a random agent. Later, we learned about interesting Atari game environments, and how Atari game environments are named in Gym, and then we explored their state space and action space. We also learned how to record the agent's gameplay using the wrapper class, and at the end of the chapter, we discovered other environments offered by Gym.

In the next chapter, we will learn how to find the optimal policy using two interesting algorithms called value iteration and policy iteration.

Questions

Let's evaluate our newly gained knowledge by answering the following questions:

  1. What is the use of a Gym toolkit?
  2. How do we create an environment in Gym?
  3. How do we obtain the action space of the Gym environment?
  4. How do we visualize the Gym environment?
  5. Name some classic control environments offered by Gym.
  6. How do we generate an episode using the Gym environment?
  7. What is the state space of Atari Gym environments?
  8. How do we record the agent's gameplay?

Further reading

Check out the following resources for more information:

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Covers a vast spectrum of basic-to-advanced RL algorithms with mathematical explanations of each algorithm
  • Learn how to implement algorithms with code by following examples with line-by-line explanations
  • Explore the latest RL methodologies such as DDPG, PPO, and the use of expert demonstrations

Description

With significant enhancements in the quality and quantity of algorithms in recent years, this second edition of Hands-On Reinforcement Learning with Python has been revamped into an example-rich guide to learning state-of-the-art reinforcement learning (RL) and deep RL algorithms with TensorFlow 2 and the OpenAI Gym toolkit. In addition to exploring RL basics and foundational concepts such as Bellman equation, Markov decision processes, and dynamic programming algorithms, this second edition dives deep into the full spectrum of value-based, policy-based, and actor-critic RL methods. It explores state-of-the-art algorithms such as DQN, TRPO, PPO and ACKTR, DDPG, TD3, and SAC in depth, demystifying the underlying math and demonstrating implementations through simple code examples. The book has several new chapters dedicated to new RL techniques, including distributional RL, imitation learning, inverse RL, and meta RL. You will learn to leverage stable baselines, an improvement of OpenAI’s baseline library, to effortlessly implement popular RL algorithms. The book concludes with an overview of promising approaches such as meta-learning and imagination augmented agents in research. By the end, you will become skilled in effectively employing RL and deep RL in your real-world projects.

Who is this book for?

If you’re a machine learning developer with little or no experience with neural networks interested in artificial intelligence and want to learn about reinforcement learning from scratch, this book is for you. Basic familiarity with linear algebra, calculus, and the Python programming language is required. Some experience with TensorFlow would be a plus.

What you will learn

  • Understand core RL concepts including the methodologies, math, and code
  • Train an agent to solve Blackjack, FrozenLake, and many other problems using OpenAI Gym
  • Train an agent to play Ms Pac-Man using a Deep Q Network
  • Learn policy-based, value-based, and actor-critic methods
  • Master the math behind DDPG, TD3, TRPO, PPO, and many others
  • Explore new avenues such as the distributional RL, meta RL, and inverse RL
  • Use Stable Baselines to train an agent to walk and play Atari games
Estimated delivery fee Deliver to Malta

Premium delivery 7 - 10 business days

€32.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Sep 30, 2020
Length: 760 pages
Edition : 2nd
Language : English
ISBN-13 : 9781839210686
Category :
Languages :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Estimated delivery fee Deliver to Malta

Premium delivery 7 - 10 business days

€32.95
(Includes tracking information)

Product Details

Publication date : Sep 30, 2020
Length: 760 pages
Edition : 2nd
Language : English
ISBN-13 : 9781839210686
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 134.97
Mastering Reinforcement Learning with Python
€37.99
Deep Reinforcement Learning Hands-On
€59.99
Deep Reinforcement Learning with Python
€36.99
Total 134.97 Stars icon

Table of Contents

19 Chapters
Fundamentals of Reinforcement Learning Chevron down icon Chevron up icon
A Guide to the Gym Toolkit Chevron down icon Chevron up icon
The Bellman Equation and Dynamic Programming Chevron down icon Chevron up icon
Monte Carlo Methods Chevron down icon Chevron up icon
Understanding Temporal Difference Learning Chevron down icon Chevron up icon
Case Study – The MAB Problem Chevron down icon Chevron up icon
Deep Learning Foundations Chevron down icon Chevron up icon
A Primer on TensorFlow Chevron down icon Chevron up icon
Deep Q Network and Its Variants Chevron down icon Chevron up icon
Policy Gradient Method Chevron down icon Chevron up icon
Actor-Critic Methods – A2C and A3C Chevron down icon Chevron up icon
Learning DDPG, TD3, and SAC Chevron down icon Chevron up icon
TRPO, PPO, and ACKTR Methods Chevron down icon Chevron up icon
Distributional Reinforcement Learning Chevron down icon Chevron up icon
Imitation Learning and Inverse RL Chevron down icon Chevron up icon
Deep Reinforcement Learning with Stable Baselines Chevron down icon Chevron up icon
Reinforcement Learning Frontiers Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Most Recent
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4
(20 Ratings)
5 star 75%
4 star 5%
3 star 5%
2 star 10%
1 star 5%
Filter icon Filter
Most Recent

Filter reviews by




Ganesh Nov 06, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I give full marks for ease and elegance with which the topic is dealt with. I had so much struggle learning from the other popular ones. However nothing registered in my mind. This book makes it really easy.Highly recommended.
Amazon Verified review Amazon
建築は素人 Mar 18, 2023
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
わかりやすく説明されており、選んでよかった。
Amazon Verified review Amazon
JB May 03, 2022
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
This book has a promising table of contents. But after having wasted an entire day solving error after error trying to get the code in chapter 2 running, I've thrown in the towel.The problem is probably in the environment, not the code itself, but the installation instructions are insufficien to get a correct environment.The code download contains a number of jupyter notebooks, but these are not complete.
Amazon Verified review Amazon
Rajasekhar Dec 14, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The media could not be loaded. I will give review of this book , every chapter , as of now , I have completed 4 chapters , it's really easy to understand and implement this book
Amazon Verified review Amazon
James Pruet Sep 28, 2021
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
I started reading this book to catch up on the latest in RL. I had taken some grad school courses on RL during my CS Masters and found the subject fascinating so I thought I’d check this out. To me, the theory was ok but the hard part about this book is that the “why” feels like it’s never explained. A blackjack example is given which seems fun to see how that would work but then, many choices are never explained and the “optimal policy” is severely dumbed down - optimal blackjack policy is to stay on 19 and up and hit otherwise, really? Why is that what you chose to be a starting point and how can you call that optimal? It’s not even mentioned that blackjack is far more complicated than you get to just hit or stay… frustrating to use an example and not at least say this is a trivial example.It feels like most examples are this way though, the “why” just isn’t explained aside from saying something along the lines of “because of the value function equation in 3.4.2, we do this”.I got through about 200 pages and had to put it down, I wasn’t absorbing anything theory, application, or otherwise.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela