Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Simplifying Data Engineering and Analytics with Delta

You're reading from   Simplifying Data Engineering and Analytics with Delta Create analytics-ready data that fuels artificial intelligence and business intelligence

Arrow left icon
Product type Paperback
Published in Jul 2022
Publisher Packt
ISBN-13 9781801814867
Length 334 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Anindita Mahapatra Anindita Mahapatra
Author Profile Icon Anindita Mahapatra
Anindita Mahapatra
Arrow right icon
View More author details
Toc

Table of Contents (18) Chapters Close

Preface 1. Section 1 – Introduction to Delta Lake and Data Engineering Principles
2. Chapter 1: Introduction to Data Engineering FREE CHAPTER 3. Chapter 2: Data Modeling and ETL 4. Chapter 3: Delta – The Foundation Block for Big Data 5. Section 2 – End-to-End Process of Building Delta Pipelines
6. Chapter 4: Unifying Batch and Streaming with Delta 7. Chapter 5: Data Consolidation in Delta Lake 8. Chapter 6: Solving Common Data Pattern Scenarios with Delta 9. Chapter 7: Delta for Data Warehouse Use Cases 10. Chapter 8: Handling Atypical Data Scenarios with Delta 11. Chapter 9: Delta for Reproducible Machine Learning Pipelines 12. Chapter 10: Delta for Data Products and Services 13. Section 3 – Operationalizing and Productionalizing Delta Pipelines
14. Chapter 11: Operationalizing Data and ML Pipelines 15. Chapter 12: Optimizing Cost and Performance with Delta 16. Chapter 13: Managing Your Data Journey 17. Other Books You May Enjoy

Understanding the role of data personas

Since data engineering is such a crucial field, you may be wondering who the main players are and what skill sets they possess. Building a data product involves several folks, all of whom need to come together with seamless handoffs to ensure a successful end product or service is created. It would be a mistake to create silos and increase both the number and complexity of integration points as each additional integration is a potential failure point. Data engineering has a fair overlap with software engineering and data science tasks:

Figure 1.3 – Data engineering requires multidisciplinary skill sets

Figure 1.3 – Data engineering requires multidisciplinary skill sets

All these roles require an understanding of data engineering:

  • Data engineers focus on maintaining how the data pipelines that ingest and transform data run. This has a lot in common with a software engineering role coupled with lots of data.
  • BI analysts focus on SQL-based reporting and can be operational or domain-specific subject-matter experts (SMEs) such as financial or supply chain analysts.
  • Data scientists and ML practitioners are statisticians who explore and analyze the data (via Exploratory Data Analysis (EDA)) and use modeling techniques at various levels of sophistication.
  • DevOps and MLOps focus on the infrastructure aspects of monitoring and automation. MLOps is DevOps coupled with the additional task of managing the life cycle of analytic models.
  • ML engineers refer to folks who can span across both the data engineer and data scientist roles.
  • Data leaders are chief data officers – that is, data stewards who are at the top of the food chain in terms of the ultimate governors of data.

The following diagram shows the typical placement of the four main data personas working collaboratively on a data platform to produce business insights to give the company a competitive advantage in the industry:

Figure 1.4 – Data personas working in collaboration

Figure 1.4 – Data personas working in collaboration

Let's take a look at a few of these points in more detail:

  1. DevOps is responsible for ensuring all operational aspects of the data platform and traditionally does a lot of scripting and automation.
  2. Data/ML engineers are responsible for building the data pipeline and taking care of the extract, transform, load (ETL) aspects of the pipeline.
  3. Data scientists of varying skill levels build models.
  4. Business analysts create reporting dashboards from aggregated curated data.
You have been reading a chapter from
Simplifying Data Engineering and Analytics with Delta
Published in: Jul 2022
Publisher: Packt
ISBN-13: 9781801814867
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime