Embedding feature creation into a scikit-learn pipeline
Throughout this chapter, we’ve discussed how to automatically create and select features from time-series data by utilizing tsfresh
. Then, we used these features to train a classification model to predict whether an office was occupied at any given hour.
tsfresh
includes wrapper classes around its main functions, extract_features
and extract_relevant_features
, to make the creation and selection of features compatible with the scikit-learn pipeline.
In this recipe, we will set up a scikit-learn pipeline that extracts features from time series using tsfresh
and then trains a logistic regression model with those features to predict office occupancy.
How to do it...
Let’s begin by importing the necessary libraries and getting the dataset ready:
- Let’s import the required libraries and functions:
import pandas as pd from sklearn.pipeline import Pipeline from sklearn.linear_model import LogisticRegression...