You're reading from Serverless ETL and Analytics with AWS Glue Your comprehensive reference guide to learning about AWS Glue and its features

Product type Paperback

Published in Aug 2022

Publisher Packt

ISBN-13 9781800564985

Length 434 pages

Edition 1st Edition

Languages

Python

Tools

AWS

Concepts

Data Analysis

Authors (6):

Vishal Pathak

Ishan Gaur

Tomohiro Tanaka

Albert Quiroga

Subramanya Vajiraya

Noritaka Sekiyama

+2 more

View More author details

Table of Contents (20) Chapters

Preface

1. Section 1 – Introduction, Concepts, and the Basics of AWS Glue

2. Chapter 1: Data Management – Introduction and Concepts FREE CHAPTER

3. Chapter 2: Introduction to Important AWS Glue Features

4. Chapter 3: Data Ingestion

5. Section 2 – Data Preparation, Management, and Security

6. Chapter 4: Data Preparation

7. Chapter 5: Data Layouts

8. Chapter 6: Data Management

9. Chapter 7: Metadata Management

10. Chapter 8: Data Security

11. Chapter 9: Data Sharing

12. Chapter 10: Data Pipeline Management

13. Section 3 – Tuning, Monitoring, Data Lake Common Scenarios, and Interesting Edge Cases

14. Chapter 11: Monitoring

15. Chapter 12: Tuning, Debugging, and Troubleshooting

16. Chapter 13: Data Analysis

17. Chapter 14: Machine Learning Integration

18. Chapter 15: Architecting Data Lakes for Real-World Scenarios and Edge Cases

19. Other Books You May Enjoy

Summary

In this chapter, we discussed data collection practices that are used by organizations and the issue of dark data. We also discussed different storage and processing techniques, such as OLTP and OLAP, and how organizations are using a combination of these two techniques to extract value from the data gathered. We briefly discussed the evolution of data management strategies such as data warehousing, data lakes, the data lakehouse, and data meshes and the role played by ETL and ELT processes in ingesting data into OLAP systems for analysis.

Then, we introduced the Apache Spark framework and talked about how Spark executes workloads by dividing them into different Spark Jobs, stages, and tasks. After this, we discussed different services in the AWS cloud that can be used to execute Spark workloads. We introduced AWS Glue and the different features available in Glue that make it a full-fledged data integration platform and not just a managed ETL service.

In the next chapter, we will discuss the different microservices that are available in AWS Glue and how they work. We will also focus on some Glue-specific features/enhancements that make AWS Glue an ideal service for your data integration workloads.

You're reading from Serverless ETL and Analytics with AWS Glue Your comprehensive reference guide to learning about AWS Glue and its features

Table of Contents (20) Chapters Close

Summary

Authors (6)

Personalised recommendations for you

Table of Contents (20) Chapters