Data is often presented in a raw form and might contain anomalies or incorrect or malformed data, which will obviously present a problem for later processing and analysis. It is usually a good idea to build a validation step into a processing pipeline. Fortunately, the Cerberus package provides a lightweight and easy to use validation tool for Python.
For validation, we have to define a schema, which is a technical description of what the data should look like and the checks that should be performed on the data. For example, we can check the type and place bounds of the maximum and minimum values. Cerberus validators can also perform type conversions during the validation step, which allows us to plug data loaded directly from CSV files into the validator.
In this recipe, we will learn how to use Cerberus to validate data loaded from a CSV file.