Data lineage is the ability to trace back the source of a dataset to how it was created. It is a fun topic for me because it typically requires investigating the history of how systems generate data, identifying how it was processed, and working with the people who produce and consume the data. This process helps to improve your data literacy, which is the ability to read, write, analyze, and argue with data because you can learn how the data impacts the organization. Is the data critical to business functions such as generating sales or was it created for compliance purposes? These types of questions should be answered by learning more about the lineage of the data.
From experience, this process of tracing data lineage involves working sessions directly with the people who are responsible for the data and uncovering any documentation like an ERD demonstrated in the From SQL to pandas DataFrames section or help guides. In many cases, the...