Grouping data through cluster analysis
So far, we have explored datasets that contained input and target variables, and we trained a model with a set of input variables and a target variable. This is called supervised learning. However, how do you address a dataset that does not contain a label to supervise the training? Amazon Redshift ML supports unsupervised learning using the cluster analysis method, also known as the K-means algorithm. In cluster analysis, the ML algorithm automatically discovers the grouping of data points. For example, if you have a population of 1,000 people, a clustering algorithm can group them based on height, weight, or age.
Unlike supervised learning, where an ML model predicts an outcome based on a label, unsupervised models use unlabeled data. One type of unsupervised learning is clustering, where unlabeled data is grouped based on its similarity or differences. From a dataset with demographic information about individuals, you can create clusters...