In discretization using k-means clustering, the intervals are the clusters identified by the k-means algorithm. The number of clusters (k) is defined by the user. The k-means clustering algorithm has two main steps. In the initialization step, k observations are chosen randomly as the initial centers of the k clusters, and the remaining data points are assigned to the closest cluster. In the iteration step, the centers of the clusters are re-computed as the average points of all of the observations within the cluster, and the observations are reassigned to the newly created closest cluster. The iteration step continues until the optimal k centers are found. In this recipe, we will perform k-means discretization with scikit-learn, using the Boston House Prices dataset.
Germany
Slovakia
Canada
Brazil
Singapore
Hungary
Philippines
Mexico
Thailand
Ukraine
Luxembourg
Estonia
Lithuania
Norway
Chile
United States
Great Britain
India
Spain
South Korea
Ecuador
Colombia
Taiwan
Switzerland
Indonesia
Cyprus
Denmark
Finland
Poland
Malta
Czechia
New Zealand
Austria
Turkey
France
Sweden
Italy
Egypt
Belgium
Portugal
Slovenia
Ireland
Romania
Greece
Argentina
Malaysia
South Africa
Netherlands
Bulgaria
Latvia
Australia
Japan
Russia