8.1 Description
Data validation is a common requirement when moving data between applications. It is extremely helpful to have a clear definition of what constitutes valid data. It helps even more when the definition exists outside a particular programming language or platform.
We can use the JSON Schema (https://json-schema.org) to define a schema that applies to the intermediate documents created by the acquisition process. Using JSON Schema enables the confident and reliable use of the JSON data format.
The JSON Schema definition can be shared and reused within separate Python projects and with non-Python environments, as well. It allows us to build data quality checks into the acquisition pipeline to positively affirm the data really fit the requirements for analysis and processing.
Additional metadata provided with a schema often includes the provenance of the data and details on how attribute values are derived. This isn’t a formal part of a JSON Schema, but we can add some...