In this section, we discuss the centroid-based clustering technique and its computational challenges. An example of using K-means with Spark MLlib will be shown for a better understanding of the centroid-based clustering.
Centroid-based clustering (CC)
Challenges in CC algorithm
As discussed previously, in a centroid-based clustering algorithm like K-means, setting the optimal value of the number of clusters K is an optimization problem. This problem can be described as NP-hard (that is non-deterministic polynomial-time hard) featuring high algorithmic complexities, and thus the common approach is trying to achieve only an approximate solution. Consequently, solving these optimization problems imposes an extra burden and consequently...