Feature binning: equal width and equal frequency
We sometimes want to convert a feature from continuous to categorical. The process of creating k equally spaced intervals from the minimum to the maximum value of a distribution is called binning, or the somewhat less friendly discretization. Binning can address several important issues with a feature: skew, excessive kurtosis, and the presence of outliers.
Getting ready
Binning might be a good choice with the COVID-19 total cases data. It might also be useful with other variables in the dataset, including total deaths and population, but we will only work with total cases for now. total_cases
is the target variable in the following code, so it is a column—the only column—on the y_train
DataFrame.
Let’s try equal width and equal frequency binning with the COVID-19 data.
How to do it...
- We first need to import the
EqualFrequencyDiscretiser
andEqualWidthDiscretiser
fromfeature_engine
....