"If you take a galaxy and try to make it bigger, it becomes a cluster of galaxies, not a galaxy. If you try to make it smaller than that, it seems to blow itself apart"
- Jeremiah P. Ostriker
In this chapter, we will delve deeper into machine learning and find out how we can take advantage of it to cluster records belonging to a certain group or class for a dataset of unsupervised observations. In a nutshell, the following topics will be covered in this chapter:
- Unsupervised learning
- Clustering techniques
- Hierarchical clustering (HC)
- Centroid-based clustering (CC)
- Distribution-based clustering (DC)
- Determining number of clusters
- A comparative analysis between clustering algorithms
- Submitting jobs on computing clusters