What this book covers
Chapter 1, The Story of Data Engineering and Analytics, introduces the core concepts of data engineering. It introduces you to the two data processing architectures in big data – Lambda and Kappa.
Chapter 2, Discovering Storage and Compute Data Lake Architectures, introduces one of the most important concepts in data engineering – segregating storage and compute layers. By following this principle, you will be introduced to the idea of building data lakes. An understanding of this key principle will lay the foundation for your understanding of the modern-day data lake design patterns discussed later in the book.
Chapter 3, Data Engineering on Microsoft Azure, introduces the world of data engineering on the Microsoft Azure cloud platform. It will familiarize you with all the Azure tools and services that play a major role in the Azure data engineering ecosystem. These tools and services will be used throughout the book for all practical examples.
Chapter 4, Understanding Data Pipelines, introduces you to the idea of data pipelines. This chapter further enhances your knowledge of the various stages of data engineering and how data pipelines can enhance efficiency by integrating individual components together and running them in a streamlined fashion.
Chapter 5, Data Collection Stage – The Bronze Layer, guides us in building a data lake using the Lakehouse architecture. We will start with data collection and the development of the bronze layer.
Chapter 6, Understanding Delta Lake, introduces Delta Lake and helps you quickly explore the main features of Delta Lake. Understanding Delta Lake's features is an integral skill for a data engineering professional who would like to build data lakes with data freshness, fast performance, and governance in mind. We will also be talking about the Lakehouse architecture in detail.
Chapter 7, Data Curation Stage – The Silver Layer, continues our building of a data lake. The focus of this chapter will be on data cleansing, standardization, and building the silver layer using Delta Lake.
Chapter 8, Data Aggregation Stage – The Gold Layer, continues our building a data lake. The focus of this chapter will be on data aggregation and building the gold layer.
Chapter 9, Deploying and Monitoring Pipelines in Production, explains how to effectively manage data pipelines running in production. We will explore data pipeline management from an operational perspective and cover security, performance management, and monitoring.
Chapter 10, Solving Data Engineering Challenges, lists the major challenges experienced by data engineering professionals. Various use cases will be covered in this chapter and a challenge will be offered. We will deep dive into the effective handling of the challenge, explaining its resolution using code snippets and examples.
Chapter 11, Infrastructure Provisioning, teaches you the basics of infrastructure provisioning using Terraform. Using Terraform, we will provision the cloud resources on Microsoft Azure that are required for running a data pipeline.
Chapter 12, Continuous Integration and Deployment of Data Pipelines, introduces the idea of continuous integration and deployment (CI/CD) of data pipelines. Using the principles of CI/CD, data engineering professionals can rapidly deploy new data pipelines/changes to existing data pipelines in a repeatable fashion.
To get the most out of this book
You will need a Microsoft Azure account.
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
Do ensure that you close all instances of Azure after you have run your code, so that your costs are minimized.