Performing discretization with k-means clustering
The aim of a discretization procedure is to find a set of cut points that partition a variable into a small number of intervals that have good class coherence. To create partitions that group similar observations, we can use clustering algorithms such as k-means.
In discretization using k-means clustering, the partitions are the clusters identified by the k-means algorithm. The k-means clustering algorithm has two main steps. In the initialization step, k observations are chosen randomly as the initial centers of the k clusters, and the remaining data points are assigned to the closest cluster. The proximity to the cluster is measured by a distance measure, such as the Euclidean distance. In the iteration step, the centers of the clusters are re-computed as the average of all of the observations within the cluster, and the observations are reassigned to the newly created closest cluster. The iteration step continues until the optimal...