The k-means algorithm is an unsupervised machine learning (ML) clustering algorithm. The objective of this algorithm is to build k centers around which data points are centered, thereby forming k clusters. The most common implementation of this algorithm is generally done using batch-oriented processing. Streaming-based clustering algorithms are also available for this, with the following properties:
- The k clusters are built using initial data
- As new data arrives in minibatches, existing k clusters are updated to compute new k clusters
- It also possible to control the decay or decrease in the significance of older data
At a high level, the preceding steps are quite similar to the word count problem that we solved using the streaming solution. The goal of the k-means algorithm is to partition the data into k clusters. If the...