Unsupervised learning
AWS provides several unsupervised learning algorithms for the following tasks:
- Clustering: K-Means algorithm
- Dimension reduction: Principal Component Analysis (PCA)
- Pattern recognition: IP Insights
- Anomaly detection: The Random Cut Forest (RCF) algorithm
Let us start by talking about clustering and how the most popular clustering algorithm works: K-Means.
Clustering
Clustering algorithms are very popular in data science. Basically, they aim to identify similar groups in a given dataset, also known as clusters. Clustering algorithms belong to the field of non-supervised learning, which means that they do not need a label or response variable to be trained.
This is just fantastic since labeled data is very scarce! However, it comes with some limitations. The main one is that clustering algorithms provide clusters for you, but not the meaning of each cluster. Thus, someone, as a subject matter expert, has to analyze the properties...