Summary
In this chapter, you have seen some common methods for data preparation being adapted to streaming and online data. For streaming data, it is important to have easily refitting or re-estimating models.
In the first part of the chapter, you have seen two methods for scaling. The MinMaxScaler scales the data to the 0
to 1
range and, therefore, needs to make sure that none of the new data points get outside of this range. The StandardScaler uses a statistical normalization process using the mean and standard deviation.
The second part of the chapter demonstrated a regular PCA and a new, incremental version called IncrementalPCA
. This incremental method allows you to fit PCA in batches, which can help you when fitting PCA on streaming data.
With scaling and feature transformation in this chapter, and drift detection in the previous chapter, you have already seen a good part of the auxiliary tasks of machine learning on streaming. In the coming chapter, you will see the...