Summary
In this chapter, we covered machine learning classification algorithms. We saw how they fall into a few categories: binary classification, multiclass single-label classification, and multiclass multi-label classification. We learned about one of the foundational classification algorithms – logistic regression. Logistic regression is easier to interpret than many other models, as we can get rough feature importances from the coefficient sizes. It can also be used for feature selection by throwing out features whose coefficients have small p-values. We also touched on cross-validation and how it can be used to optimize hyperparameters such as the regularization hyperparameter C
. Many Python packages can be used for logistic regression (even with big data), but here, we used the sklearn
and statsmodels
packages, and saw how statsmodels
can provide p-values while sklearn
cannot.
Besides logistic regression, we also saw how to use Naïve Bayes models and k-nearest...