In equal-width discretization, the variable values are sorted into intervals of the same width. The number of intervals is decided arbitrarily and the width is determined by the range of values of the variable and the number of bins to create, so for the variable X, the interval width is given as follows:Â
Â
For example, if the values of the variable vary between 0 and 100, we can create five bins like this: width = (100-0) / 5 = 20;Â the bins will be 0-20, 20-40, 40-60, 80-100. The first and final bins (0-20 and 80-100) can be expanded to accommodate outliers, that is, values under 0 or greater than 100 would be placed in those bins as well, by extending the limits to minus and plus infinity.
In this recipe, we will carry out equal-width discretization using pandas, scikit-learn, and Feature-engine.
...