Automatically creating and selecting predictive features from time-series data
In the previous recipe, we automatically extracted several hundred features from time-series variables using tsfresh
. If we have more than one time-series variable, we can easily end up with a dataset containing thousands of features. In addition, many of the resulting features had only missing data or were constant and were therefore not useful for training machine learning models.
When we create classification and regression models to solve real-life problems, we often want our models to take a small number of relevant features as input to produce interpretable machine learning outputs. Simpler models have many advantages. First, their output is easier to interpret. Second, simpler models are cheaper to store and faster to train. They also return their outputs faster.
tsfresh
includes a highly parallelizable feature selection algorithm based on non-parametric statistical hypothesis tests, which...