Overview of Unsupervised Learning (Clustering)
Unsupervised learning is a subcategory of machine learning that learns or trains using unlabeled data. In other words, as opposed to supervised learning, where the model is expected to predict or categorize data into a set of known classes, unsupervised learning establishes the structure within data to create the categories or groups.
Before delving into unsupervised learning and, specifically, clustering, there are a few questions you need to answer:
- Based on domain knowledge, does your dataset inherently have subgroups? If yes, how do you identify the subgroups? How many subgroups are present in the dataset?
- Are the members of each subgroup similar? Typically, clustering should only be applied to datasets that have subgroups with somewhat similar datapoints.
- Are there outliers in the dataset? Outliers can often influence the choice of which clustering algorithm to use.
Answering these questions will help us to create a better...