Understanding data sources
Over the past decade, the amount and the variety of data that gets generated each year has significantly increased. Today, industry analysts talk about the volume of global data generated in a year in terms of zettabytes (ZB), a unit of measurement equal to a billion terabytes (TB). By some estimates, a little over 1 ZB of data existed in the world in 2012, and yet by the end of 2020, there would have been an estimated 59 ZB of data consumed globally.
In our pipeline whiteboarding session (covered in Chapter 5, Architecting Data Engineering Pipelines) we identified several data sources that we wanted to ingest and transform to best enable our data consumers. For each of these data sources that is identified in a whiteboarding session, you need to develop an understanding of the variety, volume, velocity, veracity, and value of data.
Data variety
In the past decade, the variety of data that has been used in data analytics projects has greatly increased...