You're reading from A Handbook of Mathematical Models with Python Elevate your machine learning projects with NetworkX, PuLP, and linalg

Product type Paperback

Published in Aug 2023

Publisher Packt

ISBN-13 9781804616703

Length 144 pages

Edition 1st Edition

Languages

Python

Tools

NetworkX

Concepts

Machine Learning

Author (1):

Ranja Sarkar

View More author details

Table of Contents (16) Chapters

Preface

1. Part 1:Mathematical Modeling

2. Chapter 1: Introduction to Mathematical Modeling FREE CHAPTER

3. Chapter 2: Machine Learning vis-à-vis Mathematical Modeling

4. Part 2:Mathematical Tools

5. Chapter 3: Principal Component Analysis

6. Chapter 4: Gradient Descent

7. Chapter 5: Support Vector Machine

8. Chapter 6: Graph Theory

9. Chapter 7: Kalman Filter

10. Chapter 8: Markov Chain

11. Part 3:Mathematical Optimization

12. Chapter 9: Exploring Optimization Techniques

13. Chapter 10: Optimization Techniques for Machine Learning

14. Index

Why subscribe?

15. Other Books You May Enjoy

Gradient Descent

One optimization algorithm that lays the foundation for machine learning models is gradient descent (GD). GD is a simple and effective tool useful to train such models. Gradient descent, as the name suggests, involves “going downhill.” We choose a direction across a landscape and take whichever step gets us downhill. The step size depends on the slope (gradient) of the hill. In machine learning (ML) models, gradient descent estimates the error gradient, helping to minimize the cost function. Very few optimization methods are as computationally efficient as gradient descent. GD also lays the foundation for the optimization of deep learning models.

In problems where the parameters cannot be calculated analytically by use of linear algebra and must be searched by optimization, GD finds its best use. The algorithm works iteratively by moving in the direction of the steepest descent. At each iteration, the model parameters, such as coefficients in linear regression and weights in neural networks, are updated. The model continues to update its parameters until the cost function converges or reaches its minimum value (the bottom of the slope in Figure 4.1a).

Figure 4.1a: Gradient descent

The size of a step taken in each iteration is called the learning rate (a function derivative is scaled by the learning rate at each iteration). With a learning rate that is too low, the model may reach the maximum permissible number of iterations before reaching the bottom, whereas it may not converge or may diverge (the so-called exploding gradient problem) completely if the learning rate is too high. Selecting the most appropriate learning rate is crucial in achieving a model with the best possible accuracy, as seen in Figure 4.1b.

Figure 4.1b: Learning rates in gradient descent

For GD to work, the objective or cost function must be differentiable (meaning the first derivative exists at each point in the domain of a univariate function) and convex (where two points on the function can be connected by a line segment without crossing). The second derivative of a convex function is always positive. Examples of convex and non-convex functions are shown in Figure 4.2. GD is a first-order optimization algorithm.

Figure 4.2: Example of convex (L) and non-convex (R) function

In a multivariate function, the gradient is a vector of derivatives in each direction in the domain. Such functions have saddle points (quasi-convex or semi-convex) where the algorithm may get stuck and obtaining a minimum is not guaranteed. This is where second-order optimization algorithms are brought in to escape the saddle point and reach the global minimum. The GD algorithm finds its use in control as well as mechanical engineering, apart from ML and DL. The following sections compare the algorithm with other optimization algorithms used in ML and deep learning (DL) models and specifically examines some commonly used gradient descent optimizers.

This chapter covers the following topics:

Gradient descent variants
Gradient descent optimizers

The rest of the chapter is locked

You're reading from A Handbook of Mathematical Models with Python Elevate your machine learning projects with NetworkX, PuLP, and linalg

Table of Contents (16) Chapters

Gradient Descent

Authors (1)

Personalised recommendations for you

You're reading from A Handbook of Mathematical Models with Python Elevate your machine learning projects with NetworkX, PuLP, and linalg

Table of Contents (16) Chapters

Gradient Descent

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you