Preparing data for modeling
Now that we’ve converted our single column of high-frequency time series data into multiple columns of frequency amplitudes by windowing our data and applying the FFT, we’re in more familiar territory. The way our data is shaped now contains columns we can use as inputs for modeling. However, there are far too many columns.
Reducing dimensionality
Some modeling algorithms support very wide tables or very large input sets; neural networks come to mind. However, practicality is not the only reason to reduce dimensionality. Overfitting is a serious concern when working with wide datasets; we want our model to generalize well to new data and not just pick up on one frequency that, perhaps by random chance, turned out to be an amazing classifier.
In the following sections, we’ll review different types of binning and filtering to reduce the dimensionality of our newly cross-sectional data to a manageable size without too much information...