Profiling Kmeans clusters
Profiling gives us a sense of what each cluster looks like. Through profiling, we can tell the differences and similarities between our various clusters. We can also tell the defining characteristics of each cluster. This is a key step when clustering, especially for exploratory data analysis purposes.
The approach to profiling for numerical fields is to find the mean of the numerical field per cluster. For categorical fields, we can find the percentage occurrence of each category per cluster. The outcome of this computation can then be displayed in various charts, such as tables, boxplots, and scatterplots. A table is typically a good first option because all the values can be displayed at once. Other chart options can then give additional context to the table insights.
We will explore how to profile Kmeans clusters using pandas
.
Getting ready
We will work with the Customer Personality Analysis data from Kaggle on this recipe. You could retrieve...