Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Hands-On Neural Networks with Keras

You're reading from   Hands-On Neural Networks with Keras Design and create neural networks using deep learning and artificial intelligence principles

Arrow left icon
Product type Paperback
Published in Mar 2019
Publisher Packt
ISBN-13 9781789536089
Length 462 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Niloy Purkait Niloy Purkait
Author Profile Icon Niloy Purkait
Niloy Purkait
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Section 1: Fundamentals of Neural Networks FREE CHAPTER
2. Overview of Neural Networks 3. A Deeper Dive into Neural Networks 4. Signal Processing - Data Analysis with Neural Networks 5. Section 2: Advanced Neural Network Architectures
6. Convolutional Neural Networks 7. Recurrent Neural Networks 8. Long Short-Term Memory Networks 9. Reinforcement Learning with Deep Q-Networks 10. Section 3: Hybrid Model Architecture
11. Autoencoders 12. Generative Networks 13. Section 4: Road Ahead
14. Contemplating Present and Future Developments 15. Other Books You May Enjoy

Double Q-learning

Another augmentation to the standard Q-learning model we just built is the idea of Double Q-learning, which was introduced by Hado van Hasselt (2010, and 2015). The intuition behind this is quite simple. Recall that, so far, we were estimating our target values for each state-action pair using the Bellman equation and checking how far off the mark our predictions are at a given state, like so:

However, a problem arises from estimating the maximum expected future reward in this manner. As you may have noticed earlier, the max operator in the target equation (yt) uses the same Q-values to evaluate a given action as the ones that are used to predict a given action for a sampled state. This introduces a propensity for overestimation of Q-values, eventually even spiraling out of control. To compensate for such possibilities, Van Hasselt et al. (2016) implemented...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image