Summary
In this chapter, you learned what data engineering is. Data engineering roles and responsibilities vary depending on the maturity of an organization's data infrastructure. But data engineering, at its simplest, is the creation of pipelines to move data from one source or format to another. This may or may not involve data transformations, processing engines, and the maintenance of infrastructure.
Data engineers use a variety of programming languages, but most commonly Python, Java, or Scala, as well as proprietary and open source transactional databases and data warehouses, both on-premises and in the cloud, or a mixture. Data engineers need to be knowledgeable in many areas – programming, operations, data modeling, databases, and operating systems. The breadth of the field is part of what makes it fun, exciting, and challenging. To those willing to accept the challenge, data engineering is a rewarding career.
In the next chapter, we will begin by setting up an environment to start building data pipelines.