Model Evaluation
When you train your model, you usually split the data into training and testing datasets. This is to ensure that the model doesn't overfit. Overfitting refers to a phenomenon where a model performs very well on the training data but fails to give good results on testing data, or in other words, the model fails to generalize.
In scikit-learn, you have a function known as train_test_split that splits the data into training and testing sets randomly.
When evaluating your model, you start by changing the parameters to improve the accuracy as per your test data. There is a high chance of leaking some of the information from the testing set into your training set if you optimize your parameters using only the testing set data. To avoid this, you can split data into three parts—training, testing, and validation sets. However, the disadvantage of this technique is that you will be further reducing your training dataset.
The solution is to use cross-validation...