Problems with the existing approach
We got the baseline score using the AdaBoost and GradientBoosting classifiers. Now, we need to increase the accuracy of these classifiers. In order to do that, we first list all the areas that can be improvised but that we haven't worked upon extensively. We also need to list possible problems with the baseline approach. Once we have the list of the problems or the areas on which we need to work, it will be easy for us to implement the revised approach.
Here, I'm listing some of the areas, or problems, that we haven't worked on in our baseline iteration:
Problem: We haven't used cross-validation techniques extensively in order to check the overfitting issue.
Solution: If we use cross-validation techniques properly, then we will know whether our trained ML model suffers from overfitting or not. This will help us because we don't want to build a model that can't even be generalized properly.
Problem: We also haven't focused on hyperparameter tuning. In our baseline approach, we mostly use the default parameters. We define these parameters during the declaration of the classifier. You can refer to the code snippet given in Figure 1.52, where you can see the classifier taking some parameters that are used when it trains the model. We haven't changed these parameters.
Solution: We need to tune these hyperparameters in such a way that we can increase the accuracy of the classifier. There are various hyperparameter-tuning techniques that we need to use.
In the next section, we will look at how these optimization techniques actually work as well as discuss the approach that we are going to take. So let's begin!