Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Deep Learning Quick Reference

You're reading from   Deep Learning Quick Reference Useful hacks for training and optimizing deep neural networks with TensorFlow and Keras

Arrow left icon
Product type Paperback
Published in Mar 2018
Publisher Packt
ISBN-13 9781788837996
Length 272 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Mike Bernico Mike Bernico
Author Profile Icon Mike Bernico
Mike Bernico
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. The Building Blocks of Deep Learning FREE CHAPTER 2. Using Deep Learning to Solve Regression Problems 3. Monitoring Network Training Using TensorBoard 4. Using Deep Learning to Solve Binary Classification Problems 5. Using Keras to Solve Multiclass Classification Problems 6. Hyperparameter Optimization 7. Training a CNN from Scratch 8. Transfer Learning with Pretrained CNNs 9. Training an RNN from scratch 10. Training LSTMs with Word Embeddings from Scratch 11. Training Seq2Seq Models 12. Using Deep Reinforcement Learning 13. Generative Adversarial Networks 14. Other Books You May Enjoy

Optimization algorithms for deep learning

The gradient descent algorithm is not the only optimization algorithm available to optimize our network weights, however it's the basis for most other algorithms. While understanding every optimization algorithm out there is likely a PhD worth of material, we will devote a few sentences to some of the most practical.

Using momentum with gradient descent

Using gradient descent with momentum speeds up gradient descent by increasing the speed of learning in directions the gradient has been constant in direction while slowing learning in directions the gradient fluctuates in direction. It allows the velocity of gradient descent to increase.

Momentum works by introducing a velocity term, and using a weighted moving average of that term in the update rule, as follows:

Most typically is set to 0.9 in the case of momentum, and usually this is not a hyper-parameter that needs to be changed.

The RMSProp algorithm

RMSProp is another algorithm that can speed up gradient descent by speeding up learning in some directions, and dampening oscillations in other directions, across the multidimensional space that the network weights represent:

This has the effect of reducing oscillations more in directions where is large.

The Adam optimizer

Adam is one of the best performing known optimizer and it's my first choice. It works well across a wide variety of problems. It combines the best parts of both momentum and RMSProp into a single update rule:

Where is some very small number to prevent division by 0.

Adam is often a great choice, and it's a great place to start when you're prototyping, so save yourself some time by starting with Adam.
You have been reading a chapter from
Deep Learning Quick Reference
Published in: Mar 2018
Publisher: Packt
ISBN-13: 9781788837996
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image