Binning algorithms
Binning is the process of taking a continuous variable and categorizing it into discrete buckets. It can be useful to turn a potentially infinite amount of values into a finite amount of “bins” for your analysis.
How to do it
Let’s imagine we have collected survey data from users of a system. One of the survey questions asks users for their age, producing data that looks like:
df = pd.DataFrame([
["Jane", 34],
["John", 18],
["Jamie", 22],
["Jessica", 36],
["Jackie", 33],
["Steve", 40],
["Sam", 30],
["Stephanie", 66],
["Sarah", 55],
["Aaron", 22],
["Erin", 28],
["Elsa", 37],
], columns=["name", "age"])
df = df.convert_dtypes(dtype_backend="numpy_nullable")
df.head()
name age
0 Jane 34
1 John 18...