Implementing Cluster Analysis on multiple variables using Kmeans
Clustering involves splitting data points into clusters or groups. The goal is to ensure all the data points in a group are similar to each other but dissimilar to data points in other groups. The clusters are determined based on similarity. There are different types of clustering algorithms; we have centroid-based, hierarchical-based, density-based, and distribution-based algorithms. These all have their strengths and the type of data they are best suited for.
In this recipe, we will focus on the most used clustering algorithm, the Kmeans clustering algorithm. This is a centroid-based algorithm that splits data into K number of clusters. These clusters are usually predefined by the user before running the algorithm. Each data point is assigned to a cluster based on its distance from the cluster centroid. The goal of the algorithm is to minimize the variance of data points within their corresponding clusters. The...