Beyond performance
Paying any price for improving the performance of machine learning models is not the objective of modeling as part of bigger pipelines at the industrial level. Increasing the performance of models by a tenth of a percent could help you win machine learning competitions or publish papers by beating state-of-the-art models. But not all improvements result in models worth deploying to production. An example of such efforts, which has been common in machine learning competitions, is model stacking. Model stacking is about using the output of multiple models to train a secondary model, which could increase the cost of inference by orders of magnitude. Python’s implementation of stacking of the logistic regression, k-nearest neighbor, random forest, support vector machine, and XGBoost classification models on the breast cancer dataset from scikit-learn
is shown here. A secondary logistic regression model uses predictions of each of these primary models as input...