Discretizing the variable into arbitrary intervals
In various industries, it is common to group variable values into segments that make sense for the business. For example, we might want to group the variable age in intervals representing children, young adults, middle-aged people, and retirees. Alternatively, we might group ratings into bad, good, and excellent. On occasion, if we know that the variable is in a certain scale (for example, logarithmic), we might want to define the interval cut points within that scale.
In this recipe, we will discretize a variable into pre-defined user intervals using pandas
and feature-engine
.
How to do it...
First, let’s import the necessary Python libraries and get the dataset ready:
- Import Python libraries and classes:
import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.datasets import fetch_california_housing
- Let’s load the California housing dataset into a
pandas
DataFrame:X, y...