Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Free Learning

You're reading from Python Feature Engineering Cookbook Over 70 recipes for creating, engineering, and transforming features to build machine learning models

Product type Paperback

Published in Jan 2020

Publisher Packt

ISBN-13 9781789806311

Length 372 pages

Edition 1st Edition

Languages

Python

Tools

NumPy

Concepts

Machine Learning

Author (1):

Soledad Galli

View More author details

Table of Contents (13) Chapters

Preface

1. Foreseeing Variable Problems When Building ML Models

2. Imputing Missing Data FREE CHAPTER

3. Encoding Categorical Variables

4. Transforming Numerical Variables

5. Performing Variable Discretization

6. Working with Outliers

7. Deriving Features from Dates and Time Variables

8. Performing Feature Scaling

9. Applying Mathematical Computations to Features

10. Creating Features with Transactional and Time Series Data

11. Extracting Features from Text Variables

12. Other Books You May Enjoy

Leave a review - let other readers know what you think

Performing one-hot encoding of frequent categories

One-hot encoding represents each category of a categorical variable with a binary variable. Hence, one-hot encoding of highly cardinal variables or datasets with multiple categorical features can expand the feature space dramatically. To reduce the number of binary variables, we can perform one-hot encoding of the most frequent categories only. One-hot encoding of top categories is equivalent to treating the remaining, less frequent categories as a single, unique category, which we will discuss in the Grouping rare or infrequent categories recipe toward the end of this chapter.

For more details on variable cardinality and frequency, visit the Determining cardinality in categorical variables recipe and the Pinpointing rare categories in categorical variables recipe in Chapter 1, Foreseeing Variable Problems...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (1)

Galli

Soledad Galli is a bestselling data science instructor, author, and open-source Python developer. As the leading instructor at Train in Data, she teaches intermediate and advanced courses in machine learning that have enrolled over 64,000 students worldwide and continue to receive positive reviews. Sole is also the developer and maintainer of the Python open-source library Feature-engine, which provides an extensive array of methods for feature engineering and selection. With extensive experience as a data scientist in finance and insurance sectors, Sole has developed and deployed machine learning models for assessing insurance claims, evaluating credit risk, and preventing fraud. She is a frequent speaker at podcasts, meetups, and webinars, sharing her expertise with the broader data science community.

See other products by Galli