Given certain variables, we usually want to find clusters of observations. These clusters should be as different as possible, and should contain "similar" observations inside. Suppose we had the following pairs of values [height=170,weight=50], [height=180,weight=70],[height=190,weight=90] and [height=200,weight=100] and we wanted to cluster them. A reasonable 2-cluster configuration would have the following centroids: [height=175,weight=60],[height=195,weight=95]. Obviously, the first two observations would fall under the first cluster, and the other two should fall under the second cluster. The simplest and most preferred algorithm for clustering is k-means. It works by picking k centroids at random and assigning each observation to the nearest centroid. The mean/center for each centroid is updated, and the procedure is repeated for the other variables...
United States
United Kingdom
India
Germany
France
Canada
Russia
Spain
Brazil
Australia
Argentina
Austria
Belgium
Bulgaria
Chile
Colombia
Cyprus
Czechia
Denmark
Ecuador
Egypt
Estonia
Finland
Greece
Hungary
Indonesia
Ireland
Italy
Japan
Latvia
Lithuania
Luxembourg
Malaysia
Malta
Mexico
Netherlands
New Zealand
Norway
Philippines
Poland
Portugal
Romania
Singapore
Slovakia
Slovenia
South Africa
South Korea
Sweden
Switzerland
Taiwan
Thailand
Turkey
Ukraine