When your dataset doesn't have a target variable, you can use clustering algorithms to explore it, based on different characteristics. These algorithms group examples together, so that each group will have examples as similar as possible to each other, but dissimilar to examples in other groups.
Since you mostly don't have labels when you are performing such analysis, there is a performance metric that you can use to examine the quality of the resulting separation found by the algorithm.
It is called the Silhouette Coefficient. The Silhouette Coefficient will help you to understand two things:
- Cohesion: Similarity within clusters
- Separation: Dissimilarity among clusters
It will give you a value between 1 and -1, with values close to 1 indicating well-formed clusters.
If you have labels in your training data, you can also use other metrics, such...