In this recipe, we explore the streaming version of KMeans in Spark used in unsupervised learning schemes. The purpose of streaming KMeans algorithm is to classify or group a set of data points into a number of clusters based on their similarity factor.
There are two implementations of the KMeans classification method, one for static/offline data and another version for continuously arriving, real-time updating data.
We will be streaming iris dataset clustering as new data streams into our streaming context.