Chapter 1: Introduction to Data Engineering
With the vast exodus of data around us, it is important to crunch it meaningfully and promptly to extract value from all the noise. This is where data engineering steps in. If collecting data is the first step, drawing useful insights is the next. Data engineering encompasses several personas that come together with their unique individual skill sets and processes to bring this to fruition. Data usually outlives the technology, and it continues to grow. New tools and frameworks come to the forefront to solve a lot of old problems. It is important to understand business requirements, the accompanying tech challenges, and typical shifts in paradigms to solve these age-old problems in a better manner.
By the end of this chapter, you should have an appreciation of the data landscape, the players, and the advances in distributed computing and cloud infrastructure that make it possible to support the high pace of innovation.
In this chapter, we will cover the following topics:
- The motivation behind data engineering
- Data personas
- Big data ecosystem
- Evolution of data stores
- Trends in distributed computing
- Business justification for tech spending