Discretization, or binning, is the process of transforming continuous variables into discrete variables by creating a set of contiguous intervals, also called bins, that span the range of the variable values. Discretization is used to change the distribution of skewed variables and to minimize the influence of outliers, and hence improve the performance of some machine learning models.
How does discretization minimize the effect of outliers? Discretization places outliers into the lower or higher intervals, together with the remaining inlier values of the distribution. Hence, these outlier observations no longer differ from the rest of the values at the tails of the distribution, as they are now all together in the same interval or bin. Also, if sorting observations across bins with equal frequency, discretization spreads the values of a...