Performing equal-width discretization
Equal-width discretization is the simplest discretization method, which consists of dividing the range of observed values for a variable into k equally sized intervals, where k is supplied by the user. The interval width for the X variable is given by the following:
Then, if the values of the variable vary between 0 and 100, we can create five bins like this: width = (100-0) / 5 = 20; the bins will be 0–20, 20–40, 40–60, and 80–100. The first and final bins (0–20 and 80–100) can be expanded to accommodate values smaller than 0 or greater than 100, by extending the limits to minus and plus infinity.
In this recipe, we will carry out equal-width discretization using pandas
, scikit-learn
, and Feature-engine
.
How to do it...
First, let’s import the necessary Python libraries and get the dataset ready:
- Import the Python libraries and the data:
import numpy as...