You're reading from Serverless ETL and Analytics with AWS Glue Your comprehensive reference guide to learning about AWS Glue and its features

Product type Paperback

Published in Aug 2022

Publisher Packt

ISBN-13 9781800564985

Length 434 pages

Edition 1st Edition

Languages

Python

Tools

AWS

Concepts

Data Analysis

Authors (6):

Vishal Pathak

Ishan Gaur

Tomohiro Tanaka

Albert Quiroga

Subramanya Vajiraya

Noritaka Sekiyama

+2 more

View More author details

Table of Contents (20) Chapters

Preface

1. Section 1 – Introduction, Concepts, and the Basics of AWS Glue

2. Chapter 1: Data Management – Introduction and Concepts FREE CHAPTER

3. Chapter 2: Introduction to Important AWS Glue Features

4. Chapter 3: Data Ingestion

5. Section 2 – Data Preparation, Management, and Security

6. Chapter 4: Data Preparation

7. Chapter 5: Data Layouts

8. Chapter 6: Data Management

9. Chapter 7: Metadata Management

10. Chapter 8: Data Security

11. Chapter 9: Data Sharing

12. Chapter 10: Data Pipeline Management

13. Section 3 – Tuning, Monitoring, Data Lake Common Scenarios, and Interesting Edge Cases

14. Chapter 11: Monitoring

15. Chapter 12: Tuning, Debugging, and Troubleshooting

16. Chapter 13: Data Analysis

17. Chapter 14: Machine Learning Integration

18. Chapter 15: Architecting Data Lakes for Real-World Scenarios and Edge Cases

19. Other Books You May Enjoy

Chapter 1: Data Management – Introduction and Concepts

A vast amount of data is being generated by people, organizations, devices, and software applications, and the volume of data being generated is growing rapidly. The numbers vary significantly, depending on the source, but it is estimated that approximately 60% to 80% of data gathered by organizations is dark data. Essentially, data is being collected, processed, and stored for a long time by organizations for compliance reasons, but the data is not used for any other purposes, such as analytics or direct monetization. In most cases, storing and securing this data can be more expensive than the value extracted.

In today’s digital economy, organizations are striving to be data-driven by basing their strategic business decisions on intelligence that’s been obtained from data gathered from various sources. Until recently, organizations thought of data purely in the context of transactions and locked it away in heavily siloed databases that were built for transaction processing; however, this was not suitable for open-ended analysis. All this changed with advancements in data processing techniques and drops in the costs involved in processing and analyzing data. Organizations are now adopting data-driven approaches for key business decisions.

In this chapter, we will cover the following topics:

Types of data processing – OLTP and OLAP
Data warehouses and data marts
Data lakes
Data lakehouse
Data mesh
Apache Spark on the AWS cloud
AWS Glue
Querying data using AWS

The topics in this chapter will introduce us to different data management techniques and different tools and services offered by the AWS cloud. These concepts will help you understand the different design approaches you can take to build effective data integration and management setups that are suitable to your use cases when using AWS Glue.

You're reading from Serverless ETL and Analytics with AWS Glue Your comprehensive reference guide to learning about AWS Glue and its features

Table of Contents (20) Chapters Close

Chapter 1: Data Management – Introduction and Concepts

Authors (6)

Personalised recommendations for you

Table of Contents (20) Chapters