Although in this chapter we have built a rudimentary model that matches the performance of academic research studies, there is certainly room for improvement. The following are some ideas for how the model can be improved, and we leave it to the reader to implement these suggestions and any other tricks or techniques the reader might know to improve performance. How high will your performance go?
First and foremost, the current training data has a large number of columns. Some sort of feature selection is almost always performed, particularly for logistic regression and random forest models. For logistic regression, common methods of performing feature selection include:
- Using a certain number of predictors that have the highest coefficients
- Using a certain number of predictors that have the lowest p-values
- Using lasso regularization and removing predictors...