Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
The Regularization Cookbook

You're reading from   The Regularization Cookbook Explore practical recipes to improve the functionality of your ML models

Arrow left icon
Product type Paperback
Published in Jul 2023
Publisher Packt
ISBN-13 9781837634088
Length 424 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Vincent Vandenbussche Vincent Vandenbussche
Author Profile Icon Vincent Vandenbussche
Vincent Vandenbussche
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. Chapter 1: An Overview of Regularization 2. Chapter 2: Machine Learning Refresher FREE CHAPTER 3. Chapter 3: Regularization with Linear Models 4. Chapter 4: Regularization with Tree-Based Models 5. Chapter 5: Regularization with Data 6. Chapter 6: Deep Learning Reminders 7. Chapter 7: Deep Learning Regularization 8. Chapter 8: Regularization with Recurrent Neural Networks 9. Chapter 9: Advanced Regularization in Natural Language Processing 10. Chapter 10: Regularization in Computer Vision 11. Chapter 11: Regularization in Computer Vision – Synthetic Image Generation 12. Index 13. Other Books You May Enjoy

Training a model

Once data has been fully cleaned and prepared, it is fairly easy to train a model thanks to scikit-learn. In this recipe, before training a logistic regression model on the Titanic dataset, we will quickly recap the ML paradigm and the different types of ML we can use.

Getting ready

If you were asked how to differentiate a car from a truck, you may be tempted to provide a list of rules, such as the number of wheels, size, weight, and so on. By doing so, you would be able to provide a set of explicit rules that would allow anyone to identify a car and a truck as different types of vehicles.

Traditional programming is not so different. While developing algorithms, programmers often build explicit rules, which allow them to map from data input (for example, a vehicle) to answers (for example, a car). We can summarize this paradigm as data + rules = answers.

If we were to train an ML model to discriminate cars from trucks, we would use another strategy: we would feed an ML algorithm with many pieces of data and their associated answers, expecting the model to learn to correct rules by itself. This is a different approach that can be summarized as data + answers = rules. This paradigm difference is summarized in Figure 2.4. As little as it might look to ML practitioners, it changes everything in terms of regularization:

Figure 2.4 – Comparing traditional programming with ML algorithms

Figure 2.4 – Comparing traditional programming with ML algorithms

Regularizing traditional algorithms is conceptually straightforward. For example, what if the rules for defining a truck overlap with the bus definition? If so, we can add the fact that buses have lots of windows.

Regularization in ML is intrinsically implicit. What if the model in this case does not discriminate between buses and trucks?

  • Should we add more data?
  • Is the model complex enough to capture such a difference?
  • Is it underfitting or overfitting?

This fundamental property of ML makes regularization complex.

ML can be applied to many tasks. Anyone who uses ML knows there is not just one type of ML model.

Arguably, most ML models fall into three main categories:

  • Supervised learning
  • Unsupervised learning
  • Reinforcement learning

As is usually the case for categories, the landscape is more complex, with sub-categories and methods overlapping several categories. But this is beyond the scope of this book.

This book will focus on regularization for supervised learning. In supervised learning, the problem is usually quite easy to specify: we have input features, X (for example, apartment surface), and labels, y (for example, apartment price). The goal is to train a model so that it’s robust enough to predict y, given X.

The two major types of ML are classification and regression:

  • Classification: The labels are made of qualitative data. For example, the task is predicting between two or more classes such as car, bus, and truck.
  • Regression: The labels are made of quantitative data. For example, the task is predicting an actual value, such as an apartment price.

Again, the line can be blurry; some tasks can be solved with classification while the labels are quantitative data, while others tasks can be both classification and regression ones. See Figure 2.5:

Figure 2.5 – Regularization versus classification

Figure 2.5 – Regularization versus classification

How to do it…

Assuming we want to train a logistic regression model (which will be explained properly in the next chapter), the scikit-learn library provides the LogisticRegression class, along with the fit() and predict() methods. Let’s learn how to use it:

  1. Import the LogisticRegression class:
    from sklearn.linear_model import LogisticRegression
  2. Instantiate a LogisticRegression object:
    # Instantiate the model
    lr = LogisticRegression()
  3. Fit the model on the train set:
    # Fit on the training data
    lr.fit(X_train, y_train)
  4. Optionally, compute predictions by using that model on the test set:
    # Compute and store predictions on the test data
    y_pred = lr.predict(X_test)

See also

Even though more details will be provided in the next chapter, you might be interested in looking at the documentation of the LogisticRegression class: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html.

You have been reading a chapter from
The Regularization Cookbook
Published in: Jul 2023
Publisher: Packt
ISBN-13: 9781837634088
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image