Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Observability for Data Engineering

You're reading from   Data Observability for Data Engineering Proactive strategies for ensuring data accuracy and addressing broken data pipelines

Arrow left icon
Product type Paperback
Published in Dec 2023
Publisher Packt
ISBN-13 9781804616024
Length 228 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Michele Pinto Michele Pinto
Author Profile Icon Michele Pinto
Michele Pinto
Sammy El Khammal Sammy El Khammal
Author Profile Icon Sammy El Khammal
Sammy El Khammal
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Part 1: Introduction to Data Observability
2. Chapter 1: Fundamentals of Data Quality Monitoring FREE CHAPTER 3. Chapter 2: Fundamentals of Data Observability 4. Part 2: Implementing Data Observability
5. Chapter 3: Data Observability Techniques 6. Chapter 4: Data Observability Elements 7. Chapter 5: Defining Rules on Indicators 8. Part 3: How to adopt Data Observability in your organization
9. Chapter 6: Root Cause Analysis 10. Chapter 7: Optimizing Data Pipelines 11. Chapter 8: Organizing Data Teams and Measuring the Success of Data Observability 12. Part 4: Appendix
13. Chapter 9: Data Observability Checklist 14. Chapter 10: Pathway to Data Observability 15. Index 16. Other Books You May Enjoy

Fundamentals of Data Quality Monitoring

Welcome to the exciting world of Data Observability for Data Engineering!

As you open the pages of this book, you will embark on a journey that will immerse you in data observability. The knowledge within this book is designed to equip you, as a data engineer, data architect, data product owner, or data engineering manager, with the skills and tools necessary to implement best practices in your data pipelines.

In this book, you will learn how data observability can help you build trust in your organization. Observability provides insights directly from within the process, offering a fresh approach to monitoring. It’s a method for determining whether the pipeline is functioning properly, especially in terms of adhering to its data quality standards.

Let’s get real for a moment. In our world, where we’re swimming in data, it’s easy to feel like we’re drowning. Data observability isn’t just some fancy term – it’s your life raft. Without it, you’re flying blind, making decisions based on guesswork. Who wants to be in that hot seat when data disasters strike? Not you.

This book isn’t just another item on your reading list; it’s the missing piece in your data puzzle. It’s about giving you the superpower to spot the small issues in your data before they turn into full-blown catastrophes. Think about the cost, not just in dollars, but in sleepless nights and lost trust, when data incidents occur. Scary, right?

But here’s the kicker: data observability isn’t just about avoiding nightmares; it’s about building a foundation of trust. When your data’s in check, your team can make bold, confident decisions without that nagging doubt. That’s priceless.

Data observability is not just a buzzword – we are deeply convinced it is the backbone of any resilient, efficient, and reliable data pipeline. This book will take you on a comprehensive exploration of the core principles of data observability, the techniques you can use to develop an observability approach, the challenges faced when implementing it, and the best practices being employed by industry leaders. This book will be your compass in the vast universe of data observability by providing you with various examples that allow you to bridge the gap between theory and practice.

The knowledge in this book is organized into four essential parts. In part one, we will lay the foundation by introducing the fundamentals of data quality monitoring and how data observability takes it to the next level. This crucial groundwork will ensure you understand the core concepts and will set the stage for the next topics.

In part two, we will move on to the practical aspects of implementing data observability. You will dive into various techniques and elements of observability and learn how to define rules on indicators. This part will provide you with the skills to apply data observability in your projects.

The third part will focus on adopting data observability at scale in your organization. You will discover the main benefits of data observability by learning how to conduct root cause analysis, how to optimize pipelines, and how to foster a culture change within your team. This part is essential to ensure the successful implementation of a data observability program.

Finally, the fourth part will contain additional resources focused on data engineering, such as a data observability checklist and a technical roadmap to implement it, leaving you with strong takeaways so that you can stand on your own two feet.

Let’s start with a hypothetical scenario. You are a data engineer, coming back from your holidays and ready to start the quarter. You have a lot of new projects for the year. However, the second you reach your desktop, Lucy from the marketing team calls out to you: “The marketing report of last month is totally wrong – please fix it ASAP. I need to update my presentation!

This is annoying; all the work that’s been scheduled for the day is delayed, and you need to check the numbers. You open your Tableau dashboard and start a Zoom meeting with the marketing team. The first task of the day: understand what she meant by wrong. Indeed, the turnover seems odd. It’s time for you to have a look at the SQL database feeding the dashboard. Again, you see the same issue. This is strange and will require even more investigation.

After hours of manual and tedious checks, contacting three different teams and sending 12 emails, you finally found the culprit: an ingestion script, feeding the company’s master database, was modified to express the turnover in thousands of dollars instead of units. Because the data team didn’t know that the metric would be used by the marketing team, the information did not pass and the pipeline was fed with the wrong data.

It’s not the first time this has happened. Hours of productivity are ruined by firefighting data issues. It’s decided – you need to implement a new strategy to avoid this.

Observability is intimately correlated with the notions of data quality. The latter is often defined as a way of measuring data indicators. Data quality is one thing, but monitoring it is something else! Through this chapter, we will explore the principles of data quality and understand how those can guide you on the data observability journey and how the information bias between stakeholders is key to understanding the need for data quality and observability in the data pipeline.

Data quality comes from the need to ensure correct and sustainable data pipelines. We will look at the different stakeholders of a data pipeline and describe why they need data quality. We will also define data quality through several concepts, which will lead to you understanding how a common base can be created between stakeholders.

By the end of this chapter, you will understand how data quality can be monitored and turned into metrics, preparing the ground for data observability.

In this chapter, we’ll cover the following topics:

  • Learning about the maturity path of data in companies
  • Identifying information bias in data
  • Exploring the seven dimensions of data quality
  • Turning data quality into SLAs
  • Indicators of data quality
  • Alerting on data quality issues
You have been reading a chapter from
Data Observability for Data Engineering
Published in: Dec 2023
Publisher: Packt
ISBN-13: 9781804616024
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime