Summary
The chapter started with an overview of data at motion and data at rest, also called as the streaming data. We further dwelled into the properties of streaming data and the challenges it poses while processing it. We introduced the stream clustering algorithm. The famous offline/online approach to stream clustering was discussed. Later on, we introduced various classes in stream
package and how to use them. During that process, we discussed ideas about several data generators, DBSTREAM algorithms to find micro and macro clusters and several metrics to assess the quality of clusters. We then introduced our use case. We went ahead to design a clustering algorithm, with the online part based on reservoir sampling and the offline part was handled by k-means algorithm. Finally, we described the steps needed to take this whole setup in a real streaming environment.
In the next chapter, we will explore graph mining algorithms. We will show you how to use the package igraph
to create and...