Validating quality
There are many ways to check the quality of your data, but there are a few forms that are more common than others. Let’s take a look at the most popular, which are also the ones you need to know for the exam. These include the following:
- Cross-validation
- Sample/spot check
- Reasonable expectations
- Data profiling
- Data audits
Some of these are pretty self-explanatory, but let’s go into a little more detail for each.
Cross-validation
Cross-validation is a statistical analysis that checks to see whether the results of a different analysis can be generalized. This analysis has many different uses. It can check data model effectiveness, specifically if you are checking for overfitting. Often, it is used to figure out what the hyperparameters should be for your model, and is a great tool for reducing test error. Cross-validation is a useful tool that can be applied in several different ways to check the quality and function...