You're reading from Modern Data Architecture on AWS A Practical Guide for Building Next-Gen Data Platforms on AWS

Product type Paperback

Published in Aug 2023

Publisher Packt

ISBN-13 9781801813396

Length 420 pages

Edition 1st Edition

Tools

AWS

Concepts

Data Science

Author (1):

Behram Irani

View More author details

Table of Contents (24) Chapters

Preface

1. Part 1: Foundational Data Lake

2. Prologue: The Data and Analytics Journey So Far FREE CHAPTER

3. Chapter 1: Modern Data Architecture on AWS

4. Chapter 2: Scalable Data Lakes

5. Part 2: Purpose-Built Services And Unified Data Access

6. Chapter 3: Batch Data Ingestion

7. Chapter 4: Streaming Data Ingestion

8. Chapter 5: Data Processing

9. Chapter 6: Interactive Analytics

10. Chapter 7: Data Warehousing

11. Chapter 8: Data Sharing

12. Chapter 9: Data Federation

13. Chapter 10: Predictive Analytics

14. Chapter 11: Generative AI

15. Chapter 12: Operational Analytics

16. Chapter 13: Business Intelligence

17. Part 3: Govern, Scale, Optimize And Operationalize

18. Chapter 14: Data Governance

19. Chapter 15: Data Mesh

20. Chapter 16: Performant and Cost-Effective Data Platform

21. Chapter 17: Automate, Operationalize, and Monetize

22. Index

Why subscribe?

23. Other Books You May Enjoy

Challenges with on-premises data systems

As data grew exponentially, so did the on-premises systems. However, visible cracks started to appear in the legacy way of architecting data and analytics use cases.

The hardware that was used to process, store, and consume data had to be procured up-front, and then installed and configured before it was ready for use. So, there was operational overhead and risks associated with procuring the hardware, provisioning it, installing software, and maintaining the system all the time. Also, to accommodate for future data growth, people had to estimate additional capacity way in advance. The concept of hardware elasticity didn’t exist. The lack of elasticity in hardware meant that there were scalability risks associated with the systems in place, and these risks would surface whenever there was a sudden growth in the volume of data or when there was a market expansion for the business.

Buying all this extra hardware up-front also meant that a huge capital expenditure investment had to be made for the hardware, with all the extra capacity lying unused from time to time. Also, software licenses had to be paid for and those were expensive, adding to the overall IT costs. Even after buying all the hardware upfront, it was difficult to maintain the data platform’s high performance all the time. As data volumes grew, latency started creeping in, which adversely affected the performance of certain critical systems.

As data grew into big data, the type of data produced was not just structured data; a lot of business use cases required semi-structured data, such as JSON files, and even unstructured data, such as images and PDF files. In subsequent chapters, we will go through some use cases that specify different types of data.

As the sources of data grew, so did the number of ETL pipelines. Managing these pipelines became cumbersome. And on top of that, with so much data movement, data started to duplicate at multiple places, which made it difficult to create a single source of truth for the data.

On the flip side, with so many data sources and data owners within an organization, data became siloed, which made it difficult to share across different LOBs in the organization.

Most of the enterprise data was either stored in an OLTP system such as an RDBMS or an OLAP system such as a data warehouse. What this meant was that organizations tried to solve most of their new use cases using the systems they had invested so heavily in. The challenge was that these systems were built and optimized for specific types of operations only. Soon, it became evident that to solve other types of data and analytics use cases, specific types of systems were needed to be in place, to meet the performance requirements.

Lastly, as businesses started to expand in other geographies, these systems needed to be expanded to other locations. And a lot of time, effort, and money was spent scaling the data platform and making it resilient in case of failures.

You're reading from Modern Data Architecture on AWS A Practical Guide for Building Next-Gen Data Platforms on AWS

Table of Contents (24) Chapters

Challenges with on-premises data systems

Authors (1)

Personalised recommendations for you