Understanding data sources
Over the past decade, the amount and the variety of data that gets generated each year has significantly increased. Today, industry analysts talk about the volume of global data generated in a year in terms of zettabytes (ZB), a unit of measurement equal to a billion terabytes (TB). By some estimates, a little over 1 ZB of data existed in the world in 2012, and yet by the end of 2025, there will be an estimated 181 ZB of data created, captured, copied, and consumed worldwide.
In our pipeline whiteboarding session (covered in Chapter 5, Architecting Data Engineering Pipelines), we identified several data sources that we wanted to ingest and transform to best enable our data consumers. For each data source that is identified in a whiteboarding session, you need to develop an understanding of the variety, volume, velocity, veracity, and value of data; we’ll move on to cover those now.
Data variety
In the past decade, the variety of data...