The data life cycle and its evolution
Data engineering is the discipline of taking data that is born elsewhere, generally in many disparate places, and putting it together to make more sense to business users than the individual pieces of information in the systems they came from.
To put it another way, data engineers do not create data; they manage and integrate existing data.
As Francesco Puppini, the inventor of the Unified Star Schema, likes to say, data comes from “datum”, and it means “given.” The information we work on is given to us; we must be the best possible stewards of it.
The art of data engineering is to store data and make it available for analysis, eventually distilling it into information, without losing the original information and adding noise.
In this section, we will look at how the data flows from where it is created to where it is consumed, introducing the most important topics to consider at each step. In the next section...