Creating a training pipeline
Machine-learning systems are usually built using different modules. These modules are combined in a particular way to achieve an end goal. The scikit-learn
library has functions that enable us to build these pipelines by concatenating various modules together. We just need to specify the modules along with the corresponding parameters. It will then build a pipeline using these modules that processes the data and trains the system.
The pipeline can include modules that perform various functions like feature selection, preprocessing, random forests, clustering, and so on. In this section, we will see how to build a pipeline to select the top K features from an input data point and then classify them using an Extremely Random Forest classifier.
Create a new Python file and import the following packages:
from sklearn.datasets import samples_generator from sklearn.feature_selection import SelectKBest, f_regression from sklearn.pipeline import Pipeline ...