Sometimes a dataset is too large to fit in memory, or the samples are streamed through a channel and received at different time steps. In this cases, none of the algorithms previously discussed can be employed because they assume to access to the whole dataset since the first steps. For this reasons, some online alternatives have been proposed and they are currently implemented in many real-life processes.
Online clustering
Mini-batch K-means
This algorithm is an extension of standard K-means but, as centroids cannot be computed with all samples, it's necessary to include an additional step that is responsible for reassigning the samples when an existing cluster is no longer valid. In particular, instead of computing...