You're reading from Serverless Analytics with Amazon Athena Query structured, unstructured, or semi-structured data in seconds without setting up any infrastructure

Product type Paperback

Published in Nov 2021

Publisher Packt

ISBN-13 9781800562349

Length 438 pages

Edition 1st Edition

Languages

Python

Tools

Amazon Athena

Concepts

Data Processing

Authors (3):

Aaron Wishnick

Mert Turkay Hocanin

Anthony Virtuoso

View More author details

Table of Contents (20) Chapters

Preface

1. Section 1: Fundamentals Of Amazon Athena

2. Chapter 1: Your First Query FREE CHAPTER

3. Chapter 2: Introduction to Amazon Athena

4. Chapter 3: Key Features, Query Types, and Functions

5. Section 2: Building and Connecting to Your Data Lake

6. Chapter 4: Metastores, Data Sources, and Data Lakes

7. Chapter 5: Securing Your Data

8. Chapter 6: AWS Glue and AWS Lake Formation

9. Section 3: Using Amazon Athena

10. Chapter 7: Ad Hoc Analytics

11. Chapter 8: Querying Unstructured and Semi-Structured Data

12. Chapter 9: Serverless ETL Pipelines

13. Chapter 10: Building Applications with Amazon Athena

14. Chapter 11: Operational Excellence – Monitoring, Optimization, and Troubleshooting

Technical requirements

15. Section 4: Advanced Topics

16. Chapter 12: Athena Query Federation

17. Chapter 13: Athena UDFs and ML

18. Chapter 14: Lake Formation – Advanced Topics

19. Other Books You May Enjoy

Summary

In this chapter, you saw just how easy it is to get started running queries with Athena. We obtained sample data from the NYC TLC, used it to create a table in our S3-based data lake, and ran some analytics queries to understand the insights contained in that data. Since Athena is serverless, we spent absolutely no time setting up any infrastructure or software. Incredibly, all the operations we ran in this chapter cost less than $0.00135. Without the serverless aspect of Athena, we would have found ourselves purchasing many thousands of dollars of hardware or hundreds of dollars in cloud resources to run these basic exercises.

While the main goals of this chapter were to orient you to the uniquely serverless experience of using Amazon Athena, there are a few concepts worth remembering as you continue reading. The first is the role of the Metastore. We saw that uploading our data to S3 was not enough for Athena to query the data. We also needed to register the location, schema, and file format as a table in AWS Glue Data Catalog. Once our table was defined, it became queryable from Athena. Chapter 3, Key Features, Query Types, and Functions, will cover this topic in greater depth.

The next important thing we saw was the feature-rich SQL dialect we used in our basic analytics queries. Since Athena utilizes a customized variant of Presto, you can refer to Presto's documentation (https://prestodb.io/docs/current/) as a supplement for Athena's documentation.

Chapter 2, Introduction to Amazon Athena, will go deeper into Athena's capabilities and open source roots so that you can understand when to use Athena, as well as how you can gain deeper insight into specific behaviors of the service.

You're reading from Serverless Analytics with Amazon Athena Query structured, unstructured, or semi-structured data in seconds without setting up any infrastructure

Table of Contents (20) Chapters

Summary

Authors (3)

Personalised recommendations for you