Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

How to implement Reinforcement Learning with TensorFlow

Save for later
  • 3 min read
  • 05 Mar 2018

article-image

[box type="note" align="" class="" width=""]This article is an excerpt from the book, Deep Learning Essentials co-authored by Wei Di, Anurag Bhardwaj, and Jianing Wei. This book will help you get to grips with the essentials of deep learning by leveraging the power of Python.[/box]

In today’s tutorial, we will implement reinforcement learning with TensorFlow-based Qlearning algorithm.

We will look at a popular game, FrozenLake, which has an inbuilt environment in the OpenAI gym package. The idea behind the FrozenLake game is quite simple. It consists of 4 x 4 grid blocks, where each block can have one of the following four states:

  • S: Starting point/Safe state
  • F: Frozen surface/Safe state
  • H: Hole/Unsafe state
  • G: Goal/Safe or Terminal state

In each of the 16 cells, you can use one of the four actions, namely up/down/left/right, to move to a neighboring state. The goal of the game is to start from state S and end at state G. We will show how we can use a neural network-based Q-learning system to learn a safe path from state S to state G. First, we import the necessary packages and define the game environment:

import gym

import numpy as np

import random

import tensorflow as tf

env = gym.make('FrozenLake-v0')

Once the environment is defined, we can define the network structure that learns the Qvalues. We will use a one-layer neural network with 16 hidden neurons and 4 output neurons as follows:

input_matrix = tf.placeholder(shape=[1,16],dtype=tf.float32)

weight_matrix = tf.Variable(tf.random_uniform([16,4],0,0.01))

Q_matrix = tf.matmul(input_matrix,weight_matrix)

prediction_matrix = tf.argmax(Q_matrix,1)

nextQ = tf.placeholder(shape=[1,4],dtype=tf.float32)

loss = tf.reduce_sum(tf.square(nextQ - Q_matrix))

train = tf.train.GradientDescentOptimizer(learning_rate=0.05)

model = train.minimize(loss)

init_op = tf.global_variables_initializer()

Now we can choose the action greedily:

ip_q = np.zeros(num_states)

ip_q[current_state] = 1

a,allQ = sess.run([prediction_matrix,Q_matrix],feed_dict={input_matrix:

[ip_q]})

if np.random.rand(1) < sample_epsilon:

a[0] = env.action_space.sample()

next_state, reward, done, info = env.step(a[0])

ip_q1 = np.zeros(num_states)

ip_q1[next_state] = 1

Q1 = sess.run(Q_matrix,feed_dict={input_matrix:[ip_q1]})

maxQ1 = np.max(Q1)

targetQ = allQ

targetQ[0,a[0]] = reward + y*maxQ1

_,W1 = sess.run([model,weight_matrix],feed_dict={input_matrix:

[ip_q],nextQ:targetQ})

Figure RL with Q-learning example shows the sample output of the program when executed. You can see different values of Q matrix as the agent moves from one state to the other. You also notice a value of reward 1 when the agent is in state 15:

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime

implement-reinforcement-learning-tensorflow-img-0

To summarize, we saw how reinforcement learning can be practically implemented using TensorFlow.

If you found this post useful, do check out the book Deep Learning Essentials which will help you fine-tune and optimize your deep learning models for better performance.

implement-reinforcement-learning-tensorflow-img-1