Summary
In this chapter, we finished the initial exploration of the case study data by examining the response variable. Once we became confident in the completeness and correctness of the data set, we became prepared to explore the relation between features and response and build models.
We spent much of this chapter getting used to model fitting in scikit-learn at a technical, coding level, and learning about metrics we could use with the binary classification problem of the case study. When trying different feature sets and different kinds of models, you will need some way to tell if one approach is working better than another. Consequently, you'll need to use model performance metrics.
While accuracy is a familiar and intuitive metric, we learned why it may not give a useful assessment of the performance of a classifier. We learned how to use a majority class null model to tell whether an accuracy rate is truly good, or no better than what would result from prediction of simply the most...