Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Free Learning

You're reading from Python Feature Engineering Cookbook Over 70 recipes for creating, engineering, and transforming features to build machine learning models

Product type Paperback

Published in Jan 2020

Publisher Packt

ISBN-13 9781789806311

Length 372 pages

Edition 1st Edition

Languages

Python

Tools

NumPy

Concepts

Machine Learning

Author (1):

Soledad Galli

View More author details

Table of Contents (13) Chapters

Preface

1. Foreseeing Variable Problems When Building ML Models

2. Imputing Missing Data FREE CHAPTER

3. Encoding Categorical Variables

4. Transforming Numerical Variables

5. Performing Variable Discretization

6. Working with Outliers

7. Deriving Features from Dates and Time Variables

8. Performing Feature Scaling

9. Applying Mathematical Computations to Features

10. Creating Features with Transactional and Time Series Data

11. Extracting Features from Text Variables

12. Other Books You May Enjoy

Leave a review - let other readers know what you think

Implementing random sample imputation

Random sampling imputation consists of extracting random observations from the pool of available values in the variable. Random sampling imputation preserves the original distribution, which differs from the other imputation techniques we've discussed in this chapter and is suitable for numerical and categorical variables alike. In this recipe, we will implement random sample imputation with pandas and Feature-engine.

How to do it...

Let's begin by importing the required libraries and tools and preparing the dataset:

Let's import pandas, the train_test_split function from scikit-learn, and RandomSampleImputer from Feature-engine:

import pandas as pd
from...

You have been reading a chapter from

Python Feature Engineering Cookbook

Published in: Jan 2020

Publisher: Packt

ISBN-13: 9781789806311

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (1)

Galli

Soledad Galli is a bestselling data science instructor, author, and open-source Python developer. As the leading instructor at Train in Data, she teaches intermediate and advanced courses in machine learning that have enrolled over 64,000 students worldwide and continue to receive positive reviews. Sole is also the developer and maintainer of the Python open-source library Feature-engine, which provides an extensive array of methods for feature engineering and selection. With extensive experience as a data scientist in finance and insurance sectors, Sole has developed and deployed machine learning models for assessing insurance claims, evaluating credit risk, and preventing fraud. She is a frequent speaker at podcasts, meetups, and webinars, sharing her expertise with the broader data science community.

See other products by Galli