In this chapter, we're going to discuss some advanced clustering algorithms that can be employed when K-means (as well as other similar methods) fails to cluster a dataset. In Chapter 9, Clustering Fundamentals, we have seen that such models are based on the assumption of convex clusters that can be surrounded by a hyperspherical boundary. In this way, simple distance metrics can be employed to determine the correct labeling. Unfortunately, many real-life problems are based on concave and irregular structures that are wrongly split by K-means or a Gaussian mixture.
We will also explain two famous online algorithms that can be chosen whenever the dataset is too large to fit into the memory or when the data is streamed in a real-time flow. Surprisingly, even if these models work with a limited number of samples, their performance is only slightly worse than...