Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Azure Data Engineer Associate Certification Guide

You're reading from   Azure Data Engineer Associate Certification Guide Ace the DP-203 exam with advanced data engineering skills

Arrow left icon
Product type Paperback
Published in May 2024
Publisher Packt
ISBN-13 9781805124689
Length 548 pages
Edition 2nd Edition
Languages
Tools
Concepts
Arrow right icon
Authors (3):
Arrow left icon
Newton Alex Newton Alex
Author Profile Icon Newton Alex
Newton Alex
Giacinto Palmieri Giacinto Palmieri
Author Profile Icon Giacinto Palmieri
Giacinto Palmieri
Mr. Surendra Mettapalli Mr. Surendra Mettapalli
Author Profile Icon Mr. Surendra Mettapalli
Mr. Surendra Mettapalli
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Part 1: Azure Basics FREE CHAPTER
2. Chapter 1: Introducing Azure Basics 3. Part 2: Data Storage
4. Chapter 2: Implementing a Partition Strategy 5. Chapter 3: Designing and Implementing the Data Exploration Layer 6. Part 3:Data Processing
7. Chapter 4: Ingesting and Transforming Data 8. Chapter 5: Developing a Batch Processing Solution 9. Chapter 6: Developing a Stream Processing Solution 10. Chapter 7: Managing Batches and Pipelines 11. Part 4:Secure, Monitor, and Optimize Data Storage and Processing
12. Chapter 8: Implementing Data Security 13. Chapter 9: Monitoring Data Storage and Data Processing 14. Chapter 10: Optimizing and Troubleshooting Data Storage and Data Processing 15. Chapter 11: Accessing the Online Practice Resources 16. Other Books You May Enjoy

What This Book Covers

This book is aligned with the revised syllabus of Exam DP-203: Azure Data Engineer Associate Certification and comprises the following chapters:

Chapter 1, Introducing Azure Basics, will introduce you to Azure and explains its capabilities. This is a refresher chapter designed to renew your knowledge of some of the core Azure concepts, including VMs, data storage, compute options, the Azure portal, accounts, and subscriptions. You will be building on top of these technologies in future chapters.

Chapter 2, Implementing a Partition Strategy, will explore the implementation of partition strategies for efficient data management. You will delve into strategies for optimizing analytical workloads through data partitioning and discuss approaches to improve performance for streaming workloads. Additionally, you will examine the utilization of partitioning within Azure Synapse Analytics for enhanced data processing, and identify scenarios where partitioning is necessary in ADLS Gen2 for improved data organization and processing.

Chapter 3, Designing and Implementing the Data Exploration Layer, will focus on creating and executing queries using SQL Serverless and Spark cluster technologies. You will also review database templates in Azure Synapse Analytics and their implementation as part of this exploration. Additionally, you will learn to push new or updated data lineage to Microsoft Purview and explore the importance of searching and browsing metadata in the Microsoft Purview data catalog for effective data management.

Chapter 4, Ingesting and Transforming Data, will focus on designing and implementing incremental loads for efficient data ingestion. You will utilize Apache Spark, Transact-SQL (T-SQL) in Azure Synapse Analytics, Stream Analytics, and ADF for data transformations. You will also look into the various aspects of data pipelines, such as cleansing data, parsing data, encoding, and decoding data, and normalizing and denormalizing values. Additionally, you will focus on configuring error handling for transformations, including handling duplicate, missing, and late-arriving data. Finally, you will delve into performing exploratory analysis for effective data analysis.

Chapter 5, Developing a Batch Processing Solution, will utilize a combination of Azure Data Lake Storage, ADB, Azure Synapse Analytics, and ADF. You will use PolyBase to load data into an SQL pool and implement Azure Synapse Link for efficient data loading. Additionally, you will learn how to create and test data pipelines, integrate notebooks, and configure batch retention as part of your data pipeline development. Error handling is examined as well, including managing upserted data, reverting data to a previous state, and configuring exception handling for robust data processing.

Chapter 6, Developing a Stream Processing Solution, will focus on creating solutions using Stream Analytics and Azure Event Hubs for real-time data processing. You will use Spark Structured Streaming for data processing. Additionally, you will address schema management, including handling schema drift and managing time series data effectively. Finally, you will learn about pipeline optimization techniques, such as configuring checkpoints, watermarking, and optimizing pipelines for analytical and transactional purposes.

Chapter 7, Managing Batches and Pipelines, will cover triggering and handling failed batch loads to ensure data integrity. For pipeline management, you will focus on managing and scheduling data pipelines using ADF and Azure Synapse Pipelines. Additionally, you will learn how to implement version control for pipeline artifacts to track changes effectively and explore managing Spark jobs within a pipeline for efficient Spark job management.

Chapter 8, Implementing Data Security, will explore strategies for data masking and encryption to ensure data protection and focuses on how to design and implement data encryption, both at rest and in transit, data auditing, data masking, and data retention. You will implement security controls such as row-level, column-level security, and Azure RBAC to restrict access effectively. Additionally, you will cover access management, including managing POSIX-like Access Control Lists (ACLs) for Data Lake Storage Gen2 and securing endpoints to control data access. Finally, you will address sensitive data management, including handling sensitive information within DataFrames and managing encrypted data for enhanced security.

Chapter 9, Monitoring Data Storage and Data Processing, covers the implementation of logging used by Azure Monitor, focusing on setting up and utilizing its features to track the activities and health of Azure services effectively. You will explore the performance of data movement processes within Azure services and monitor and update statistics about data across a system to reflect its current state accurately. You will delve into monitoring data pipeline performance, identifying bottlenecks and ensuring smooth data flow, and you will learn how to interpret Azure Monitor metrics and logs to make informed decisions. Finally, you will implement a pipeline alert strategy for prompt responses to potential issues.

Chapter 10, Optimizing and Troubleshooting Data Storage and Data Processing, will explore strategies for compacting small files to improve processing efficiency and system performance. You will review techniques for handling skew in data distribution to mitigate processing delays, explore ways to manage data spillage and optimize resource management to maximize performance, use indexers to reduce data search times, and use caching to speed up query execution. Additionally, you will learn about troubleshooting failed Spark jobs, diagnosing, and resolving issues that cause them to fail, troubleshooting failed pipeline runs (including activities executed in external services), and providing insights on identifying and fixing problems to ensure smooth pipeline execution.

Minimum Hardware Requirements

For an optimal experience, the following hardware configuration is recommended:

  • Processor: Dual-core or better
  • Memory: 4 GB RAM
  • Storage: 10 GB available space

Minimum Software Requirements

You must have the following software installed:

Chapter

Software Required

OS Required

1–10

Azure account (free or paid)

Windows, macOS, and Linux

1–10

Azure Command-Line Interface (CLI)

Windows, macOS, and Linux

1–10

Visual Studio Code (VS Code)

Windows, macOS, and Linux

Note

You can find the Azure CLI installation link in GitHub as part of Chapter 1, Introducing Azure Basics, at https://packt.link/muMNE.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime