Customizing scikit-learn transformers
Now that we have a process for transforming the DataFrame into a machine learning-ready sparse matrix, it would be advantageous to generalize the process with transformers so that it can easily be repeated for new data coming in.
Scikit-learn transformers work with machine learning algorithms by using a fit
method, which finds model parameters, and a transform
method, which applies these parameters to data. These methods may be combined into a single fit_transform
method that fits and transforms data in one line of code.
When used together, various transformers, including machine learning algorithms, may work together in the same pipeline for ease of use. Data is then placed in the pipeline that is fit and transformed to achieve the desired output.
Scikit-learn comes with many great transformers, such as StandardScaler
and Normalizer
to standardize and normalize data, respectively, and SimpleImputer
to convert null values. You have to...