17.3 Concept of metadata and provenance
The description of a dataset includes three important aspects:
The syntax or physical format and logical layout of the data
The semantics, or meaning, of the data
The provenance, or the origin and transformations applied to the data
The physical format of a dataset is often summarized using the name of a well-known file format. For example, the data may be in CSV format. The order of columns in a CSV file may change, leading to a need to have headings or some metadata describing the logical layout of the columns within a CSV file.
Much of this information can be enumerated in JSON schema definitions.
In some cases, the metadata might be yet another CSV file that has column numbers, preferred data types, and column names. We might have a secondary CSV file that looks like the following example:
1,height,height in inches 2,weight,weight in pounds 3,price,price in dollars
This metadata information describes the contents of a separate CSV file with...