Sometimes, dataset X is too large, and the algorithms can become extremely slow, with a proportional need for memory. In these cases, it's preferable to employ a batch strategy that can learn while the data is streamed. As the number of parameters is generally very small, Online Clustering is quite fast and only a little bit less accurate than standard algorithms working with the whole dataset.
Online Clustering
Mini-batch K-means
The first approach we are going to consider is a mini-batch version of the standard K-means algorithm. In this case, we cannot compute the centroids for all samples, and so the main problem is to define a criterion to reassign the centroids after a partial fit. The standard process is based...