Preface
Python Feature Engineering Cookbook, covers almost every aspect of feature engineering for tabular data, including missing data imputation, categorical encoding, variable transformation, discretization, scaling, and the handling of outliers. It also discusses how to extract features from date and time, text, time series, and relational datasets.
This book will take the pain out of feature engineering by showing you how to use open source Python libraries to accelerate the feature engineering process, via multiple practical, hands-on recipes. Throughout the book, you will transform and create new variables utilizing pandas
and scikit-learn
. Additionally, you’ll learn to leverage the power of four major open source feature engineering libraries – Feature-engine, Category Encoders, Featuretools, and tsfresh.
You’ll also discover additional recipes that weren’t in the second edition. These cover imputing missing data in time series, creating new features with decision trees, and highlighting outliers using the median absolute deviation. More importantly, we provide guidelines to help you decide which transformations to use, based on your model and data features. You’ll know exactly what, why, and how to implement each feature transformation.