What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

Imputing Missing Data

Missing data refers to the absence of values for certain observations and is an unavoidable problem in most data sources. Scikit-learn does not support missing values as input, so we need to remove observations with missing data or transform them into permitted values. The act of replacing missing data with statistical estimates of missing values is called imputation. The goal of any imputation technique is to produce a complete dataset that can be used to train machine learning models. There are multiple imputation techniques we can apply to our data. The choice of imputation technique we use will depend on whether the data is missing at random, the number of missing values, and the machine learning model we intend to use. In this chapter, we will discuss several missing data imputation techniques.

This chapter...

Key benefits

Discover solutions for feature generation, feature extraction, and feature selection

Uncover the end-to-end feature engineering process across continuous, discrete, and unstructured datasets

Implement modern feature extraction techniques using Python's pandas, scikit-learn, SciPy and NumPy libraries

Description

Feature engineering is invaluable for developing and enriching your machine learning models. In this cookbook, you will work with the best tools to streamline your feature engineering pipelines and techniques and simplify and improve the quality of your code. Using Python libraries such as pandas, scikit-learn, Featuretools, and Feature-engine, you’ll learn how to work with both continuous and discrete datasets and be able to transform features from unstructured datasets. You will develop the skills necessary to select the best features as well as the most suitable extraction techniques. This book will cover Python recipes that will help you automate feature engineering to simplify complex processes. You’ll also get to grips with different feature engineering strategies, such as the box-cox transform, power transform, and log transform across machine learning, reinforcement learning, and natural language processing (NLP) domains. By the end of this book, you’ll have discovered tips and practical solutions to all of your feature engineering problems.

Who is this book for?

This book is for machine learning professionals, AI engineers, data scientists, and NLP and reinforcement learning engineers who want to optimize and enrich their machine learning models with the best features. Knowledge of machine learning and Python coding will assist you with understanding the concepts covered in this book.

What you will learn

Simplify your feature engineering pipelines with powerful Python packages

Get to grips with imputing missing values

Encode categorical variables with a wide set of techniques

Extract insights from text quickly and effortlessly

Develop features from transactional data and time series data

Derive new features by combining existing variables

Understand how to transform, discretize, and scale your variables

Create informative variables from date and time

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

Frequently bought together

$48.99

$54.99

$43.99

Total $ 147.97

Filter reviews by

All

Amazon verified reviews

Amazon Customer Nov 14, 2020

Thorough recollection of feature transformations to tackle multiple aspects of data quality and to extract features from different data formats, like text, time series and transactions. Great resource to have at hand when in front of a new dataset.

Amazon Verified review

Omar Pasha Mar 26, 2021

I was exactly what I needed to know!

Shorsh Nov 11, 2020

This book contains all the recipes that are needed for any aspiring data scientist. It contains very good examples that are easy to follow with a good theory explanation on what you are doing.Some basic python knowledge is needed before hand as it wont start from scratch, it is assumed that you have already faced issues with your feature engineering pipelines.The author of this book has created a master piece of art with the feature engineering library, very easy to use and with awesome results.This book became one of my favorite ones very fast!! A must read if you are pursuing a DS/ML/AI position

Kevin Nov 29, 2022

As other reviews have stated the book delivers what it says it will; Python code that generates a lot of feature-engineering. I find this book to be fantastic, and Sole's work overall, as it gives life to new feature-engineering possibilities and does it fast. Long gone are the days of writing your own custom transformers or unique time-series features. This book automates a lot of that headache and will absolutely be the first reference I go to when I need to handle a new feature. I personally hadn't dealt with tsfresh prior to reading through and it brought to life instantaneous time-series features I no longer have to write scripts for. A very happy customer on that knowledge alone! Per usual, Sole continues to advance the ML community for the betterment of all.

jml Sep 23, 2020

The Python Feature Engineering Cookbook (PFEC) delivers exactly what the name implies. It’s a collection of recipes targeted at specific tasks; if you’re working in an AI or ML environment and have a need to massage variable data, handle math functions, or normalize data strings, this book will quickly earn a place on your shelf. Each recipe is presented in a standardized format that walks you through the theory and implementation of the code performing the function. Short introductions and appropriate external references provide background for every task, and as long as you have a reasonable familiarity with pandas, scikit-learn, Numpy, Python, and Jupyter, you’ll find a number of uses for the techniques covered.It’s not designed to be a tutorial for those just starting out with machine learning, and isn’t written in a style that invites casual reading. The material tends toward the dry side. While the author does an admirable job of distilling the necessary information into the basic framework of prepare-perform-review, PFEC definitely falls into the reference book category as opposed to being a guide for the uninitiated.In short, you’ll want to have PFEC around if you’re involved in a project that requires hands-on data manipulation in a Python machine-learning environment. Paired with a good guide to ML basics and implementation, it’ll keep you from reinventing quite a few wheels.

Python Feature Engineering Cookbook: Over 70 recipes for creating, engineering, and transforming features to build machine learning models

What do you get with a Packt Subscription?

Python Feature Engineering Cookbook

Imputing Missing Data

Technical requirements

Removing observations with missing data

How to do it...

Performing mean or median imputation

Implementing mode or frequent category imputation

How to do it...

Replacing missing values with an arbitrary number

Capturing missing values in a bespoke category

How to do it...

Replacing missing values with a value at the end of the distribution

Implementing random sample imputation

How to do it...

Adding a missing value indicator variable

Getting ready

Performing multivariate imputation by chained equations

Assembling an imputation pipeline with scikit-learn

How to do it...

Assembling an imputation pipeline with Feature-engine

How to do it...

Page 1 of 13

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with a Packt Subscription?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the author

FAQs

Python Feature Engineering Cookbook: Over 70 recipes for creating, engineering, and transforming features to build machine learning models

What do you get with a Packt Subscription?

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with a Packt Subscription?

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the author

FAQs