Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Python Deep Learning

You're reading from   Python Deep Learning Next generation techniques to revolutionize computer vision, AI, speech and data analysis

Arrow left icon
Product type Paperback
Published in Apr 2017
Publisher Packt
ISBN-13 9781786464453
Length 406 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (4):
Arrow left icon
Peter Roelants Peter Roelants
Author Profile Icon Peter Roelants
Peter Roelants
Daniel Slater Daniel Slater
Author Profile Icon Daniel Slater
Daniel Slater
Valentino Zocca Valentino Zocca
Author Profile Icon Valentino Zocca
Valentino Zocca
Gianmario Spacagna Gianmario Spacagna
Author Profile Icon Gianmario Spacagna
Gianmario Spacagna
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Machine Learning – An Introduction FREE CHAPTER 2. Neural Networks 3. Deep Learning Fundamentals 4. Unsupervised Feature Learning 5. Image Recognition 6. Recurrent Neural Networks and Language Models 7. Deep Learning for Board Games 8. Deep Learning for Computer Games 9. Anomaly Detection 10. Building a Production-Ready Intrusion Detection System Index

Policy gradients in AlphaGo


For AlphaGo using policy gradients, the network was set up to play games against itself. It did so with a reward of 0 for every time step until the final one where the game is either won or lost, giving a reward of 1 or -1. This final reward is then applied to every time step in the network, and the network is trained using policy gradients in the same way as our Tic-tac-toe example. To prevent overfitting, games were played against a randomly selected previous version of the network. If the network constantly plays against itself, the risk is it could end up with some very niche strategies, which would not work against varied opponents, a local minima of sorts.

Building the initial supervised learning network that predicted the most likely moves by human players allowed AlphaGo to massively reduce the breadth of the search it needs to perform in MCTS. This allowed them to get much more accurate evaluation per rollout. The problem is that running a large many-layered...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image