Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Lakehouse in Action

You're reading from   Data Lakehouse in Action Architecting a modern and scalable data analytics platform

Arrow left icon
Product type Paperback
Published in Mar 2022
Publisher Packt
ISBN-13 9781801815932
Length 206 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Author (1):
Arrow left icon
Pradeep Menon Pradeep Menon
Author Profile Icon Pradeep Menon
Pradeep Menon
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. PART 1: Architectural Patterns for Analytics
2. Chapter 1: Introducing the Evolution of Data Analytics Patterns FREE CHAPTER 3. Chapter 2: The Data Lakehouse Architecture Overview 4. PART 2: Data Lakehouse Component Deep Dive
5. Chapter 3: Ingesting and Processing Data in a Data Lakehouse 6. Chapter 4: Storing and Serving Data in a Data Lakehouse 7. Chapter 5: Deriving Insights from a Data Lakehouse 8. Chapter 6: Applying Data Governance in the Data Lakehouse 9. Chapter 7: Applying Data Security in a Data Lakehouse 10. PART 3: Implementing and Governing a Data Lakehouse
11. Chapter 8: Implementing a Data Lakehouse on Microsoft Azure 12. Chapter 9: Scaling the Data Lakehouse Architecture 13. Other Books You May Enjoy

Exploring the five factors of change

The year 2007 changed the world as we know it; the day Steve Jobs took the stage and announced the iPhone launch was a turning point in the age of data. That day brewed the perfect "data" storm.

A perfect storm is a meteorological event that occurs as a result of a rare combination of factors. In the world of data evolution, such a perfect storm occurred in the last decade, one that has catapulted data as a strategic enterprise asset. Five ingredients caused the perfect "data" storm.

Figure 1.2 – Ingredients of the perfect "data" storm

Figure 1.2 – Ingredients of the perfect "data" storm

As depicted in Figure 1.2, there were five factors to the perfect storm. An exponential growth of data and an increase in computing power were the first two factors. These two factors coincided with a decrease in storage cost. The rise of AI and the advancement of cloud computing coalesced at the same time to form the perfect storm.

These factors developed independently and converged together, changing and shaping industries. Let's look into each of these factors briefly.

The exponential growth of data

The exponential growth of data is the first ingredient of the perfect storm.

Figure 1.3 – Estimated data growth between 2010 and 2020

Figure 1.3 – Estimated data growth between 2010 and 2020

According to the International Data Corporation (IDC), by 2025, the total data volumes generated will reach around 163 ZB (zettabytes), that is, a trillion gigabytes. In 2010, that number was approximately 0.5 ZB. This exponential growth of data is attributed to a vast improvement in internet technologies that have fueled the growth of many industries. The telecommunications industry was the major industry that was transformed. This, in turn, transformed many other industries. Data became ubiquitous and every business craved more data bandwidth. Social media platforms started to be used as well. The likes of Facebook, Twitter, and Instagram flooded the internet space with more data. Streaming services and e-commerce also generated tons of data. This generated data was used to forge and influence consumer behaviors. Last, but not least, the technological leaps in the Internet of Things (IoT) space generated loads of data.

The traditional EDW pattern was not able to cope with this growth in data. They were designed for structured data. Big data had changed the definition of usable data. The data now was big (volume); some of them were continuously flowing (velocity), generated in different shapes and forms (variety), and from a plethora of sources with noise (veracity).

The increase in compute

The exponential increase in computing power is the second ingredient of the perfect storm.

Figure 1.4 – Estimated growth in transistors per microprocessors between 2010 and 2020

Figure 1.4 – Estimated growth in transistors per microprocessors between 2010 and 2020

Moore's law is the prediction made by American engineer Gordon Moore in 1965 that the number of transistors per silicon chip doubles every year. This law has been faithful to its forecast so far. In 2010, the number of transistors in a microprocessor was around 2 billion. In 2020, that number stood at 54 billion. This exponential increase in computing power dovetails with the rise of cloud computing technologies that provide limitless compute at an affordable price point.

The increase in computing power at a reasonable price point provided a much-needed impetus for big data. Organizations can now procure more and more compute at a much lower price point. The compute available in cloud computing can now be used to process and analyze data on demand.

The decrease in storage cost

The rapid decrease in storage cost is the third ingredient of the perfect storm.

Figure 1.5 – The estimated decrease in storage cost between 2010 and 2020

Figure 1.5 – The estimated decrease in storage cost between 2010 and 2020

The cost of storage has also exponentially decreased. In 2010, the average cost of storing a GB of data in a Hard Disk Drive (HDD) was around $0.1. That number has reduced to approximately $0.01 in 10 years. In the traditional EDW pattern, organizations had to be picky about which data they had to store for analysis and which data could be discarded. Holding data was an expensive proposition. However, the exponential decrease in storage cost meant that all data could now be stored at a fraction of the previous cost. There was now no need to pick and choose what should be stored and what should be discarded. Data in whatever shape or form could now be kept at a fraction of price. The mantra of store first and analyze later could now be implemented.

The rise of artificial intelligence

Artificial Intelligence (AI) systems are not new to the world. In fact, their genesis goes back to the 1950s, when statistical models were used to estimate values of data points based on past data. This field was out of focus for an extended period, as the computing power and large corpus of data required to run these models were not available.

Figure 1.6 – Timeline of the evolution of AI

Figure 1.6 – Timeline of the evolution of AI

However, after a long hibernation, AI technologies saw a resurgence in the early 2010s. This resurgence was partly due to the abundance of powerful computing resources and the equal availability of data. AI models now could be trained faster, and the results were stunningly accurate.

The factor of reduced storage cost and more available computing resources was a boon for AI. More and more complex models could now be trained.

Figure 1.7 – Accuracy of AI systems in matching humans for image recognition

Figure 1.7 – Accuracy of AI systems in matching humans for image recognition

This was especially true for deep learning algorithms. For instance, a deep learning technique called Convoluted Neural Networks (CNNs) has become very popular for detecting images. Over a period, deeper and deeper neural networks were created. Now, AI systems have surpassed human beings in detecting objects.

As AI systems became more accurate, they gained in popularity. This fueled cyclic behavior, and more and more businesses were employing AI in their digital transformation agenda.

The advancement of cloud computing

The fifth ingredient for the perfect "data" storm is the rise of cloud computing. Cloud computing is the on-demand availability of computing and storage resources. The typical public cloud service providers include big technology companies such as Amazon (AWS), Microsoft (Azure), and Google (GCP). Cloud computing eliminates the need to host large servers for computing and storage on the organization's data center. Depending on the service subscribed to in the cloud, organizations can also reduce their dependencies on software and hardware maintenance. Cloud provides a plethora of on-demand services at a very economical price point. The cloud computing landscape has constantly been rising since 2010. Worldwide spending on public clouds started at around $77 billion in 2010 and has reached around $441 billion in 2020. Cloud computing also enabled the rise of the Digitally Native Business (DNB). It propelled the rise of organizations such as Uber, Deliveroo, TikTok, and Instagram, to name a few.

Cloud computing has been a boon for data. With the rise of cloud computing, data can now be stored at a fraction of the cost. The comparatively limitless compute power that the cloud provides translates into the ability to rapidly transform data. Cloud computing also provides innovative data platforms that can be utilized at a click of a button.

These five ingredients crossed paths at an opportune moment to challenge the existing data architecture patterns. The perfect "data" storm facilitated the rise of a new data architecture paradigm focused on big data, the data lake.

You have been reading a chapter from
Data Lakehouse in Action
Published in: Mar 2022
Publisher: Packt
ISBN-13: 9781801815932
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime