Building a machine learning pipeline
Completing the machine learning pipeline requires adding the machine learning model to the previous pipeline. You need a machine learning tuple after NullValueImputer
and SparseMatrix
as follows:
full_pipeline = Pipeline([('null_imputer', NullValueImputer()),  ('sparse', SparseMatrix()), ('xgb', XGBRegressor(max_depth=1, min_child_weight=5, subsample=0.6, colsample_bytree=0.9, colsample_bylevel=0.9, colsample_bynode=0.8, missing=-999.0))])
This pipeline is now complete with a machine learning model, and it can be fit on any X
, y
combination, as follows:
full_pipeline.fit(X, y)
Now you can make predictions on any data whose target column is unknown:
new_data = X_test full_pipeline.predict(new_data)
Here are the first few rows of the expected output:
array([13.55908  ,  8.314051 , 11.078157 , 14.114085 , 12.2938385, 11.374797 , 13.9611025, 12.025812 , 10.80344 ...