Interpreting your results
Sometimes groupings in data make immediate sense. When clustering by income and age, one could come across a group that can be labeled as young
professionals
.
In UN development indicators dataset, using the Describe dialog, one can clearly see that Cluster 1
, Cluster 2
, and Cluster 3
correspond to Underdeveloped
, Developing
, and Highly Developed
countries, respectively. By doing so we're using k-means to compress the information that is contained in three columns and 180+ rows to just three labels. Clustering can sometimes also find patterns your dataset may not be able to sufficiently explain by itself.
For example, as you're clustering health records, you may find two distinct groups and why? is not immediately clear and describable with the available data, which may lead you to ask more questions and maybe later realize that difference was because one group exercised regularly while the other didn't, or one had an immunity to a certain disease. It may even indicate...