Designing Good Validation
In a Kaggle competition, in the heat of modeling and submitting results, it may seem enough to take at face value the results you get back from the leaderboard. In the end, you may think that what counts in a competition is your ranking. This is a common error that is made repeatedly in competitions. In actual fact, you won’t know what the actual leaderboard (the private one) looks like until after the competition has closed, and trusting the public part of it is not advisable because it is quite often misleading.
In this chapter, we will introduce you to the importance of validation in data competitions. You will learn about:
- What overfitting is and how a public leaderboard can be misleading
- The dreadful shake-ups
- The different kinds of validation strategies
- Adversarial validation
- How to spot and leverage leakages
- What your strategies should be when choosing your final submissions
Monitoring your...