Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Python Feature Engineering Cookbook

You're reading from   Python Feature Engineering Cookbook Over 70 recipes for creating, engineering, and transforming features to build machine learning models

Arrow left icon
Product type Paperback
Published in Oct 2022
Publisher Packt
ISBN-13 9781804611302
Length 386 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Soledad Galli Soledad Galli
Author Profile Icon Soledad Galli
Soledad Galli
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. Chapter 1: Imputing Missing Data 2. Chapter 2: Encoding Categorical Variables FREE CHAPTER 3. Chapter 3: Transforming Numerical Variables 4. Chapter 4: Performing Variable Discretization 5. Chapter 5: Working with Outliers 6. Chapter 6: Extracting Features from Date and Time Variables 7. Chapter 7: Performing Feature Scaling 8. Chapter 8: Creating New Features 9. Chapter 9: Extracting Features from Relational Data with Featuretools 10. Chapter 10: Creating Features from a Time Series with tsfresh 11. Chapter 11: Extracting Features from Text Variables 12. Index 13. Other Books You May Enjoy

Technical requirements

In this chapter, we will use the pandas, NumPy, and Matplotlib Python libraries, as well as scikit-learn and Feature-engine. For guidelines on how to obtain these libraries, visit the Technical requirements section of Chapter 1, Imputing Missing Data.

We will also use the open-source Category Encoders Python library, which can be installed using pip:

pip install category_encoders

To learn more about Category Encoders, visit the following link: https://contrib.scikit-learn.org/category_encoders/.

We will also use the Credit Approval dataset, which is available in the UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/credit+approval.

To prepare the dataset, follow these steps:

  1. Visit http://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/ and click on crx.data to download the data:
Figure 2.1 – The index directory for the Credit Approval dataset

Figure 2.1 – The index directory for the Credit Approval dataset

  1. Save crx.data to the folder where you will run the following commands.

After downloading the data, open up a Jupyter Notebook and run the following commands.

  1. Import the required libraries:
    import random
    import numpy as np
    import pandas as pd
  2. Load the data:
    data = pd.read_csv("crx.data", header=None)
  3. Create a list containing the variable names:
    varnames = [f"A{s}" for s in range(1, 17)]
  4. Add the variable names to the DataFrame:
    data.columns = varnames
  5. Replace the question marks in the dataset with NumPy NaN values:
    data = data.replace("?", np.nan)
  6. Cast some numerical variables as float data types:
    data["A2"] = data["A2"].astype("float")
    data["A14"] = data["A14"].astype("float")
  7. Encode the target variable as binary:
    data["A16"] = data["A16"].map({"+": 1, "-": 0})
  8. Rename the target variable:
    data.rename(columns={"A16": "target"}, inplace=True)
  9. Make lists that contain categorical and numerical variables:
    cat_cols = [
        c for c in data.columns if data[c].dtypes=="O"] 
    num_cols = [
        c for c in data.columns if data[c].dtypes!= "O"]
  10. Fill in the missing data:
    data[num_cols] = data[num_cols].fillna(0)
    data[cat_cols] = data[cat_cols].fillna("Missing")
  11. Save the prepared data:
    data.to_csv("credit_approval_uci.csv", index=False)

You can find a Jupyter Notebook that contains these commands in this book’s GitHub repository at https://github.com/PacktPublishing/Python-Feature-Engineering-Cookbook-Second-Edition/blob/main/ch02-categorical-encoding/donwload-prepare-store-credit-approval-dataset.ipynb.

Note

Some libraries require that you have already imputed missing data, for which you can use any of the recipes from Chapter 1, Imputing Missing Data.

You have been reading a chapter from
Python Feature Engineering Cookbook - Second Edition
Published in: Oct 2022
Publisher: Packt
ISBN-13: 9781804611302
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image