Exploring the data engineering components
In the context of this book, data engineering is the process of ingesting raw data from source systems and producing reliable data that could be used in scenarios such as analytics, business reporting, and ML. A data engineer is a person who builds software that collects and processes raw data to generate clean and meaningful datasets for data analysts and data scientists. These datasets will form the backbone for your organization's ML initiatives.
Figure 4.1 shows the various stages of a typical data engineering area of an ML project:
Data engineering often overlaps with feature engineering. While a data scientist decides on which features are more useful for the ML use case, he or she may work with the data engineer to retrieve particular data points that are not available in the current feature set. This is the main collaboration point between data engineers...