Summary
In this chapter, we looked at how Amazon Athena and Presto, Trino, and Hive on EMR help organizations perform ad hoc interactive data analytics on the data stored in the S3 data lake. Athena is a serverless platform that integrates with the Glue Data Catalog and provides data analysts with the ability to write and execute SQL queries without having to manage the platform itself. Using Athena, organizations can focus on the business logic needed for reports versus spending time on creating and managing the infrastructure that’s required by the platform.
We also looked at cases when creating a Presto/Trino cluster on Amazon EMR may be more beneficial for interactive analytics. This is particularly helpful when there are very large volumes of datasets that need to be scanned by thousands of queries on a daily basis and where performance SLAs are strict. Using Presto/Trino on EMR, customers can control cost and at the same time improve query performance by custom tuning...