Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
A Handbook of Mathematical Models with Python

You're reading from   A Handbook of Mathematical Models with Python Elevate your machine learning projects with NetworkX, PuLP, and linalg

Arrow left icon
Product type Paperback
Published in Aug 2023
Publisher Packt
ISBN-13 9781804616703
Length 144 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Ranja Sarkar Ranja Sarkar
Author Profile Icon Ranja Sarkar
Ranja Sarkar
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Part 1:Mathematical Modeling
2. Chapter 1: Introduction to Mathematical Modeling FREE CHAPTER 3. Chapter 2: Machine Learning vis-à-vis Mathematical Modeling 4. Part 2:Mathematical Tools
5. Chapter 3: Principal Component Analysis 6. Chapter 4: Gradient Descent 7. Chapter 5: Support Vector Machine 8. Chapter 6: Graph Theory 9. Chapter 7: Kalman Filter 10. Chapter 8: Markov Chain 11. Part 3:Mathematical Optimization
12. Chapter 9: Exploring Optimization Techniques 13. Chapter 10: Optimization Techniques for Machine Learning 14. Index 15. Other Books You May Enjoy

Gradient Descent

One optimization algorithm that lays the foundation for machine learning models is gradient descent (GD). GD is a simple and effective tool useful to train such models. Gradient descent, as the name suggests, involves “going downhill.” We choose a direction across a landscape and take whichever step gets us downhill. The step size depends on the slope (gradient) of the hill. In machine learning (ML) models, gradient descent estimates the error gradient, helping to minimize the cost function. Very few optimization methods are as computationally efficient as gradient descent. GD also lays the foundation for the optimization of deep learning models.

In problems where the parameters cannot be calculated analytically by use of linear algebra and must be searched by optimization, GD finds its best use. The algorithm works iteratively by moving in the direction of the steepest descent. At each iteration, the model parameters, such as coefficients in linear regression and weights in neural networks, are updated. The model continues to update its parameters until the cost function converges or reaches its minimum value (the bottom of the slope in Figure 4.1a).

Figure 4.1a: Gradient descent

Figure 4.1a: Gradient descent

The size of a step taken in each iteration is called the learning rate (a function derivative is scaled by the learning rate at each iteration). With a learning rate that is too low, the model may reach the maximum permissible number of iterations before reaching the bottom, whereas it may not converge or may diverge (the so-called exploding gradient problem) completely if the learning rate is too high. Selecting the most appropriate learning rate is crucial in achieving a model with the best possible accuracy, as seen in Figure 4.1b.

Figure 4.1b: Learning rates in gradient descent

Figure 4.1b: Learning rates in gradient descent

For GD to work, the objective or cost function must be differentiable (meaning the first derivative exists at each point in the domain of a univariate function) and convex (where two points on the function can be connected by a line segment without crossing). The second derivative of a convex function is always positive. Examples of convex and non-convex functions are shown in Figure 4.2. GD is a first-order optimization algorithm.

Figure 4.2: Example of convex (L) and non-convex (R) function

Figure 4.2: Example of convex (L) and non-convex (R) function

In a multivariate function, the gradient is a vector of derivatives in each direction in the domain. Such functions have saddle points (quasi-convex or semi-convex) where the algorithm may get stuck and obtaining a minimum is not guaranteed. This is where second-order optimization algorithms are brought in to escape the saddle point and reach the global minimum. The GD algorithm finds its use in control as well as mechanical engineering, apart from ML and DL. The following sections compare the algorithm with other optimization algorithms used in ML and deep learning (DL) models and specifically examines some commonly used gradient descent optimizers.

This chapter covers the following topics:

  • Gradient descent variants
  • Gradient descent optimizers
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image