Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Lakehouse in Action

You're reading from   Data Lakehouse in Action Architecting a modern and scalable data analytics platform

Arrow left icon
Product type Paperback
Published in Mar 2022
Publisher Packt
ISBN-13 9781801815932
Length 206 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Author (1):
Arrow left icon
Pradeep Menon Pradeep Menon
Author Profile Icon Pradeep Menon
Pradeep Menon
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. PART 1: Architectural Patterns for Analytics
2. Chapter 1: Introducing the Evolution of Data Analytics Patterns FREE CHAPTER 3. Chapter 2: The Data Lakehouse Architecture Overview 4. PART 2: Data Lakehouse Component Deep Dive
5. Chapter 3: Ingesting and Processing Data in a Data Lakehouse 6. Chapter 4: Storing and Serving Data in a Data Lakehouse 7. Chapter 5: Deriving Insights from a Data Lakehouse 8. Chapter 6: Applying Data Governance in the Data Lakehouse 9. Chapter 7: Applying Data Security in a Data Lakehouse 10. PART 3: Implementing and Governing a Data Lakehouse
11. Chapter 8: Implementing a Data Lakehouse on Microsoft Azure 12. Chapter 9: Scaling the Data Lakehouse Architecture 13. Other Books You May Enjoy

Introducing the data lakehouse paradigm

In 2006, Clive Humbly, a British mathematician, coined the now-famous phrase, "Data is the new oil." It was akin to peering through a crystal ball and peeking into the future. Data is the lifeblood of organizations. The competitive advantage is defined by how an organization uses data. Data management is paramount in this age of digital transformation. More and more organizations are embracing digital transformation programs, and data is at the core of these transformations.

As discussed earlier, the paradigms of the EDW and data lakes were opportune for their times. They had their benefits and their challenges. A new paradigm needed to emerge that was disciplined at its core and flexible at its edges.

Figure 1.9 – Data lakehouse paradigm

Figure 1.9 – Data lakehouse paradigm

The new data architectural paradigm is called the data lakehouse. It strives to combine the advantages of both the data lake and the EDW paradigms while minimizing their challenges.

An adequately architected data lakehouse delivers four key benefits.

Figure 1.10 – Benefits of the data lakehouse

Figure 1.10 – Benefits of the data lakehouse

  1. It derives insights from both structured and unstructured data: The data lakehouse architecture should be able to store, transform, and integrate structured and unstructured data. It should be able to fuse them together and enable the extraction of valuable insights from the data.
  2. It caters to different personas of the organizations: Data is a dish with different tastes for different personas. The data lakehouse should be able to cater to the needs of these personas. The data lakehouse caters to a range of organizational personas and fulfills their requirements for insights. A data scientist should get their playground for testing their hypothesis. An analyst should be able to analyze data using their tools of choice, and business users should be able to get their reports accurately and on time. It democratizes data for analytics.
  3. It facilitates the adoption of a robust governance framework: The primary challenge with the data lake architecture pattern was the lack of a strong governance framework. It was easy for a data lake to become a data swamp. In contrast, an EDW architecture was stymied by too much governance for too little content. The data lakehouse architecture strives to hit the governance balance. It seeks to achieve the proper governance for the correct data type with access to the right stakeholder.
  4. It leverages cloud computing: Data lakehouse architecture needs to be agile and innovative. The pattern needs to adapt to the changing organizational requirements and reduce the data to insight turnover time. To achieve this agility, it is imperative to adopt cloud computing technology. The cloud computing platforms offer the innovativeness required. It provides the appropriate technology stack with scalability and flexibility, and fulfills the demands of a modern data analytics platform.

The data lakehouse paradigm addresses the challenges faced by the EDW and the data lake paradigm. Yet, it does have its own set of challenges that needs to be managed. A few of those challenges are as follows:

  • Architectural complexity: Given that the data lakehouse pattern amalgamates the EDW and the data lake pattern, it is inevitable that it will have its fair share of architectural complexity. The complexity manifests in the form of multiple components required to fruition the pattern. Architectural patterns are quid pro quo; it is vital to carefully trade off architectural complexity with the potential business benefit. The data lakehouse architecture needs to tread that path carefully.
  • Required holistic data governance: The challenges pertinent to the data lake paradigm do not magically go away with the data lakehouse paradigm. The biggest challenge of a data lake was that it was prone to becoming a data swamp. As the data lakehouse grows in its scope and complexity, the lack of a holistic governance framework is a sure-shot way of creating a swamp out of a data lakehouse.
  • Balancing flexibility with discipline: The data lakehouse paradigm strives to be flexible and to adapt to changing business requirements with agility. The ethos under which it operates is to have discipline at the core and flexibility at the edges. Achieving this objective is a careful balancing act that clearly defines the limits of flexibility and the strictness of discipline. The data lakehouse stewards play an essential role in ensuring this balance.

Let's recap what we've discussed in this chapter.

You have been reading a chapter from
Data Lakehouse in Action
Published in: Mar 2022
Publisher: Packt
ISBN-13: 9781801815932
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime