You're reading from Debugging Machine Learning Models with Python Develop high-performance, low-bias, and explainable machine learning and deep learning models

Product type Paperback

Published in Sep 2023

Publisher Packt

ISBN-13 9781800208582

Length 344 pages

Edition 1st Edition

Languages

Python

Tools

TensorFlow

Concepts

Deep Learning

Author (1):

Ali Madani

View More author details

Table of Contents (26) Chapters

Preface

1. Part 1:Debugging for Machine Learning Modeling

2. Chapter 1: Beyond Code Debugging FREE CHAPTER

3. Chapter 2: Machine Learning Life Cycle

4. Chapter 3: Debugging toward Responsible AI

5. Part 2:Improving Machine Learning Models

6. Chapter 4: Detecting Performance and Efficiency Issues in Machine Learning Models

7. Chapter 5: Improving the Performance of Machine Learning Models

8. Chapter 6: Interpretability and Explainability in Machine Learning Modeling

9. Chapter 7: Decreasing Bias and Achieving Fairness

10. Part 3:Low-Bug Machine Learning Development and Deployment

11. Chapter 8: Controlling Risks Using Test-Driven Development

12. Chapter 9: Testing and Debugging for Production

13. Chapter 10: Versioning and Reproducible Machine Learning Modeling

14. Chapter 11: Avoiding and Detecting Data and Concept Drifts

15. Part 4:Deep Learning Modeling

16. Chapter 12: Going Beyond ML Debugging with Deep Learning

17. Chapter 13: Advanced Deep Learning Techniques

18. Chapter 14: Introduction to Recent Advancements in Machine Learning

19. Part 5:Advanced Topics in Model Debugging

20. Chapter 15: Correlation versus Causality

21. Chapter 16: Security and Privacy in Machine Learning

22. Chapter 17: Human-in-the-Loop Machine Learning

23. Assessments

24. Index

Why subscribe?

25. Other Books You May Enjoy

Model and prediction-centric debugging

The predictions of a model in the training, testing, and production stages could help us detect issues with the models and find opportunities to improve them. Here, we will briefly review some aspects of model- and prediction-centric model debugging. You can read more details about these problems and other considerations in achieving a reliable model, how to identify the source of the issues, and how to resolve them in future chapters of this book.

Underfitting and overfitting

When we train a model, such as a supervised learning model, the goal is to have high performance not just in training but also in testing. When a model has low performance even in a training set, we need to deal with the issue of underfitting. We can develop more complicated models, such as a random forest or deep learning model, instead of linear and logistic regression models. More complex models might result in lower underfitting, but they might cause overfitting and result in lower generalizability of the prediction to test or production data (Figure 1.5):

Figure 1.5 – Schematic illustration of underfitting and overfitting

Algorithm and hyperparameter selection determine the level of complexity and the chance of underfitting or overfitting when training and testing a machine learning model. For example, by choosing a model that can learn nonlinear patterns instead of linear models, your model could have a higher chance of low underfitting as it could identify more complex patterns in training data. But at the same time, you could increase the chance of overfitting as some of the complex patterns in the training data might not be generalizable to the test data (Figure 1.5). There are approaches to assess underfitting and overfitting that will help you develop a high-performance and generalizable model. We will discuss these in future chapters.

Model hyperparameters

Some parameters can affect the performance of a machine learning model that usually do not get optimized automatically in the training process. These are called hyperparameters. We will go through examples of such hyperparameters, such as the number of trees in a random forest model or the size of hidden layers in neural network models, in future chapters.

Inference in model testing and production

The eventual goal of machine learning modeling is to have a highly effective model in production. When we test the model, we are assessing its generalizability, but we cannot be sure about its performance on the data it has not seen. The data that’s used for training machine learning models could become out of date. For example, the changes in the trends of the clothing market could make predictions of a model for clothing recommendation unreliable.

There are different concepts in this topic, such as data variance, data drift, and model drift, all of which we will cover in the next few chapters.

Data or hyperparameters for changing landscapes

When we train a machine learning model with specific training data and a set of hyperparameters, the values of model parameters get changed so that they’re as close to an optimum point as possible for a defined objective or loss function. The two other tools to achieve a better model are providing better data for training and selecting better hyperparameters. Each algorithm has a capacity for performance improvement. By playing with model hyperparameters alone, you cannot develop the best possible model. In the same way, by increasing the quality and quantity of your data and keeping your model hyperparameters the same, you could also not achieve the best performance possible. So, data and hyperparameters come hand in hand. Before you read the next chapters, remember that by spending more time and money on hyperparameter optimization alone, you cannot necessarily get a better model. We will look at this in more detail later in this book.

You're reading from Debugging Machine Learning Models with Python Develop high-performance, low-bias, and explainable machine learning and deep learning models

Table of Contents (26) Chapters

Model and prediction-centric debugging

Underfitting and overfitting

Inference in model testing and production

Data or hyperparameters for changing landscapes

Authors (1)

Personalised recommendations for you