Mastering lineage
Lineage or process lineage is the action of a data application on the data sources’ schemas. Lineage is a link between inputs and outputs, often one or several input schemas and an output schema.
It expresses what happens with the data inside a specific application. By extension, the lineage of the data source is the set of all the transformations that ended in creating the data source and all the computations or manipulations that are based on the data source.
As we stated previously, lineage is a link between schemas. These schemas can come from the same data source. For instance, creating a new column inside a SQL table creates a new schema inside the table that is fed by data coming from another schema of the data source.
Lineage is a unique combination of data flows – a data flow being a one-to-one relationship between an input schema and output schema that occurs inside the application. Without the application, there cannot be any lineage...