Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Fundamentals of Analytics Engineering

You're reading from   Fundamentals of Analytics Engineering An introduction to building end-to-end analytics solutions

Arrow left icon
Product type Paperback
Published in Mar 2024
Publisher Packt
ISBN-13 9781837636457
Length 332 pages
Edition 1st Edition
Tools
Arrow right icon
Authors (7):
Arrow left icon
Dumky De Wilde Dumky De Wilde
Author Profile Icon Dumky De Wilde
Dumky De Wilde
Ricardo Angel Granados Lopez Ricardo Angel Granados Lopez
Author Profile Icon Ricardo Angel Granados Lopez
Ricardo Angel Granados Lopez
Lasse Benninga Lasse Benninga
Author Profile Icon Lasse Benninga
Lasse Benninga
Taís Laurindo Pereira Taís Laurindo Pereira
Author Profile Icon Taís Laurindo Pereira
Taís Laurindo Pereira
Jovan Gligorevic Jovan Gligorevic
Author Profile Icon Jovan Gligorevic
Jovan Gligorevic
Juan Manuel Perafan Juan Manuel Perafan
Author Profile Icon Juan Manuel Perafan
Juan Manuel Perafan
Fanny Kassapian Fanny Kassapian
Author Profile Icon Fanny Kassapian
Fanny Kassapian
+3 more Show less
Arrow right icon
View More author details
Toc

Table of Contents (23) Chapters Close

Preface 1. Prologue
2. Part 1:Introduction to Analytics Engineering FREE CHAPTER
3. Chapter 1: What Is Analytics Engineering? 4. Chapter 2: The Modern Data Stack 5. Part 2: Building Data Pipelines
6. Chapter 3: Data Ingestion 7. Chapter 4: Data Warehousing 8. Chapter 5: Data Modeling 9. Chapter 6: Transforming Data 10. Chapter 7: Serving Data 11. Part 3: Hands-On Guide to Building a Data Platform
12. Chapter 8: Hands-On Analytics Engineering 13. Part 4: DataOps
14. Chapter 9: Data Quality and Observability 15. Chapter 10: Writing Code in a Team 16. Chapter 11: Automating Workflows 17. Part 5: Data Strategy
18. Chapter 12: Driving Business Adoption 19. Chapter 13: Data Governance 20. Chapter 14: Epilogue 21. Index
22. Other Books You May Enjoy

Understanding a Modern Data Stack

As the name suggests, the MDS represents a technological evolution compared to previous systems widely used in recent decades. From the development of the business data warehouse in the 1980s to the rise of cloud technology with Amazon Web Services (AWS) in the early 2000s, on-premises legacy data stacks dominated the landscape. These systems had a monolithic IT infrastructure, resulting in complex maintenance. The MDS transformed this scenario – bringing modularity and cloud-native tools. However, before we dive into the details, let’s first define what a data stack is.

A data stack is a collection of tools and services as part of an extensive technology infrastructure designed to ingest, store, transform, and serve data. It makes data accessible across an organization and is fundamental to delivering business insights through reporting and dashboards, advanced analytics, and Machine Learning (ML) applications. Figure 2.1 illustrates an example of a high-level architecture of a data stack.

Figure 2.1 – An example of a high-level architecture of a data stack

Figure 2.1 – An example of a high-level architecture of a data stack

Here, the data flows from left to right. The raw data is ingested, stored in a data warehouse, transformed, and finally, served to data analysts, data scientists, and business users.

Consequently, the MDS is nothing more than a subset of such architecture – a specific set of tools that democratizes access to the main functionalities of a data stack, reducing the complexity of implementation and improving the scalability of the data life cycle.

In the following table, we compare the main characteristics of the legacy data stack and the MDS.

Characteristic

Legacy Data Stack

Modern Data Stack

Architecture

Monolithic architecture

Modular tools

Servers

On-premises servers

Cloud-based

Maintenance

Complex – many resources required

Simplified, managed solutions

Programming languages

Java/Scala/Python

SQL-first

Data ingestion

ETL-focused

ELT-focused

Table 2.1 – A legacy versus Modern Data Stack comparison

Now that we have seen the definition of the MDS and how it compares to legacy stacks, it is time to see how it looks in practice. In Figure 2.2, an example is provided, with an overview of some of the tools that are used for key functionalities within this design:

Figure 2.2 – An example of an MDS

Figure 2.2 – An example of an MDS

As we can see in Figure 2.2, the main blocks of an MDS can be considered as follows:

  • Managed ingestion: This is responsible for the EL in ELT. It helps to streamline data extraction and ingestion through third-party-managed software applications.
  • Data warehouse/lakehouse: These are cloud-based systems used to store large volumes of data.
  • Transformation: This is what the T in ELT stands for. Transformations help in data cleaning and preparation to meet business intelligence needs.
  • Orchestration: Orchestration helps in setting up specific tasks to run automatically at a particular event. As an example, an orchestration tool can be paired up with dbt Core for scheduled transformation.
  • Self-service layer: In this block, reports and analysis are provided to business users, enabling these stakeholders to answer their questions and make data-informed decisions.

With managed ingestion, data teams are less dependent on the work of data engineers. Tools such as Stitch, Fivetran, and Airbyte empower analytics engineers to own end-to-end data pipelines, focusing on data modeling and transformation.

The list of tools and blocks in Figure 2.2 is not exhaustive. New tools and MDS companies emerge every year. Other common and important blocks are as follows:

  • Data catalog tools are used for search and discovery tools, helping users to document and democratize access to business logic and data assets. Examples are Atlan, data.world, and DataHub.
  • Data quality and observability tools are used to monitor pipelines and ensure the overall quality of data. Tool examples are Soda, Datafold, and Monte Carlo.
  • Reverse ETL is used to retrieve data from a data warehouse and publish it to the systems used by business users, such as CRM software. Key market leaders are Census, Hightouch, and RudderStack.

We will now move our focus to how the modern data stack differs from the legacy stacks.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image