Model Evaluation
When we train our model, we usually split our data into a training and testing datasets. This is to ensure that the model doesn't overfit. Overfitting refers to a phenomena where a model performs very well on the training data, but fails to give good results on testing data, or in other words, the model fails to generalize.
In scikit learn, we have a function known as train_test_split that splits the data into training and testing sets randomly.
When evaluating our model, we start by changing the parameters to improve the accuracy as per our test data. There is a high chance of leaking some of the information from the testing set to our training set if we optimize our parameters using only the testing set data. In order to avoid this, we can split data into three parts—training, testing, and validation sets. However, the disadvantage of this technique is that we will be further reducing our training dataset.
The solution is to use cross-validation. In this process, we do not...