Performing equal-width discretization
Equal-width discretization consists of dividing the range of observed values for a variable into k equally sized intervals, where k is supplied by the user. The interval width for the X variable is given by the following:
Then, if the values of the variable vary between 0 and 100, we can create five bins like this: width = (100-0) / 5 = 20. The bins will be 0–20, 20–40, 40–60, and 80–100. The first and final bins (0–20 and 80–100) can be expanded to accommodate values smaller than 0 or greater than 100 by extending the limits to minus and plus infinity.
In this recipe, we will carry out equal-width discretization using pandas
, scikit-learn
, and feature-engine
.
How to do it...
First, let’s import the necessary Python libraries and get the dataset ready:
- Let’s import the libraries and functions:
import numpy as np import pandas as pd import matplotlib.pyplot as plt from...