Clustering
For the most part of this chapter, we have primarily focused on supervised machine learning techniques where we train a model based before using it for predictions. Clustering is an unsupervised machine learning technique, used in customer segmentation, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.
Apache Spark provides various clustering algorithms, including:
- K-Means
- Latent Dirichlet Allocation (LDA)
- Bisecting K-Means
- Gaussian Mixture Models