Packt+ | Advance your knowledge in tech

You're reading from Reinforcement Learning with TensorFlow A beginner's guide to designing self-learning systems with TensorFlow and OpenAI Gym

Product type Paperback

Published in Apr 2018

Publisher Packt

ISBN-13 9781788835725

Length 334 pages

Edition 1st Edition

Languages

Python

Tools

OpenAI Gym

Concepts

Reinforcement Learning

Author (1):

Sayon Dutta

View More author details

Table of Contents (17) Chapters

Preface

1. Deep Learning – Architectures and Frameworks FREE CHAPTER

2. Training Reinforcement Learning Agents Using OpenAI Gym

3. Markov Decision Process

4. Policy Gradients

5. Q-Learning and Deep Q-Networks

6. Asynchronous Methods

7. Robo Everything – Real Strategy Gaming

8. AlphaGo – Reinforcement Learning at Its Best

9. Reinforcement Learning in Autonomous Driving

10. Financial Portfolio Management

11. Reinforcement Learning in Robotics

12. Deep Reinforcement Learning in Ad Tech

13. Reinforcement Learning in Image Processing

14. Deep Reinforcement Learning in NLP

15. Further topics in Reinforcement Learning

16. Other Books You May Enjoy

Leave a review - let other readers know what you think

AlphaGo – mastering Go

Traditional AI approaches based on search trees covering all possible position fail in the case of Go. The reason being the enormously huge search space because of 2.08 x 10¹⁷⁰ possible moves and thereby, the difficulty in evaluating the strength of each possible board position. Thus, the traditional brute force approaches fail for the enormous search space of Go.

Therefore, advanced tree search such as Monte Carlo Tree Search with Deep Neural Networks was considered to be the novel approach to capture the intuition that humans use to play the game of Go. These neural networks are convolutional neural networks (CNNs) and take an image of the board, that is, the description of the board and activates it through the series of layers to find the best move as per the given state of the game.

There are two neural networks used in the architecture of AlphaGo, which are: