All Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

Serverless Analytics with Amazon Athena

You're reading from Serverless Analytics with Amazon Athena

Product type Book

Published in Nov 2021

Publisher Packt

ISBN-13 9781800562349

Pages 438 pages

Edition 1st Edition

Languages

Concepts

Data Processing

Authors (3):

Anthony Virtuoso

Mert Turkay Hocanin

Aaron Wishnick

View More author details

Table of Contents (20) Chapters

Preface

1. Section 1: Fundamentals Of Amazon Athena

2. Chapter 1: Your First Query

3. Chapter 2: Introduction to Amazon Athena

4. Chapter 3: Key Features, Query Types, and Functions

5. Section 2: Building and Connecting to Your Data Lake

6. Chapter 4: Metastores, Data Sources, and Data Lakes

7. Chapter 5: Securing Your Data

8. Chapter 6: AWS Glue and AWS Lake Formation

9. Section 3: Using Amazon Athena

10. Chapter 7: Ad Hoc Analytics

11. Chapter 8: Querying Unstructured and Semi-Structured Data

12. Chapter 9: Serverless ETL Pipelines

13. Chapter 10: Building Applications with Amazon Athena

14. Chapter 11: Operational Excellence – Monitoring, Optimization, and Troubleshooting

Technical requirements

15. Section 4: Advanced Topics

16. Chapter 12: Athena Query Federation

17. Chapter 13: Athena UDFs and ML

18. Chapter 14: Lake Formation – Advanced Topics

19. Other Books You May Enjoy

Running ETL queries

While this book's goal is not to teach Structured Query Language (SQL), it is beneficial to spend some time reviewing everyday SQL recipes and how they relate to Athena's strengths and quirks. Transforming data from one format to another, producing intermediate datasets, or simply running a query that outputs many megabytes (MB) or gigabytes (GB) of output necessitates some understanding of Athena's best practices to achieve peak price/performance. As we did in Chapter 1, Your First Query, let's start by preparing a larger dataset for our exercises.

We will continue using the NYC Yellow Taxi dataset, but we will prepare 2.5 years of this data this time. Preparing this expanded dataset will entail downloading, compressing, and then uploading dozens of files to S3. To expedite that process, you can use the following script to automate the steps. To do so, add all the files from yellow_tripdata_2018-01.csv through yellow_tripdata_2020-06.csv...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime}

Authors (3)

Anthony Virtuoso

Anthony Virtuoso works as a Principal Engineer at Amazon and holds multiple patents in distributed systems, software defined networks, and security. In his eight years at Amazon, he has helped launch several Amazon Web Services, the most recent of which was Amazon Managed Blockchain. As one of the original authors of Athena Query Federation, you'll often find him lurking on the Athena Federation GitHub repository answering questions and shipping bug fixes. When not at work, Anthony obsesses over a different set of customers, namely his wife and two little boys, aged 2 and 5. His kids enjoy doing science experiments with dad, like 3D printing toys, building with Lego, or searching the local pond for tardigrades.

See other products by Anthony Virtuoso

Mert Turkay Hocanin

Mert Turkay Hocanin is a Principal Big Data Architect at Amazon Web Services within the AWS Glue and AWS Lake Formation services and has previously worked for several other services including Amazon Athena, Amazon EMR, Amazon Managed Blockchain. During his time at AWS, he worked with several Fortune 500 companies on some of the largest data lakes in the world and was involved with the launching of three Amazon Web Services. Prior to being a Big Data Architect, he was a Senior Software Developer within Amazon's retail systems organization building one of the earliest data lakes in the company in 2013. When he is not helping customers build data lakes, he enjoys spending time with his wife-Subrina, son-Tristan, and exploring New York City.

See other products by Mert Turkay Hocanin

Aaron Wishnick

Aaron Wishnick works as a Senior Software Engineer at Amazon, where he has been for 7 years. During that time he has worked on Amazon's payment systems, financial intelligence systems, as well as working for AWS on Athena and AWS Proton. When not at work, Aaron and his fiance, Alyssa, are on a quest to determine just how much dog fur is too much, with their husky and malamute, Mina and Wally.

See other products by Aaron Wishnick

Personalised recommendations for you

Based on your interests and search pattern

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

Aug 2023 7 hours 40 minutes

Generative AI with LangChain

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Mastering Tableau 2023

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

Aug 2023 22 hours 48 minutes

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

Sep 2023 8 hours 36 minutes

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

Sep 2023 8 hours 36 minutes

Data Engineering with AWS

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Oct 2023 21 hours 12 minutes

Modern Data Architecture on AWS

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

Aug 2023 14 hours 0 minutes

Practical Guide to Applied Conformal Prediction in Python

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

Dec 2023 8 hours 0 minutes

TinyML Cookbook

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

Nov 2023 22 hours 8 minutes