Discretizing the variable into arbitrary intervals
In various industries, it is common to group variable values into segments that make sense for the business. For example, we might want to group the variable age in intervals representing children, young adults, middle-aged people, and retired people. Alternatively, we might group ratings into bad, good, and excellent. On other occasions, if we know that the variable is in a certain scale, for example, logarithmic, we might want to define the interval cut-points within that scale.
In this recipe, we will discretize a variable into pre-defined user intervals using pandas
and Feature-engine
.
How to do it...
First, let’s import the necessary Python libraries and get the dataset ready:
- Import the required Python libraries and classes:
import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.datasets import fetch_california_housing
- Let’s load the California housing dataset into...