Ad Hoc Queries with Amazon Athena
In Chapter 8, Identifying and Enabling Varied Data Consumers, we explored a variety of data consumers. Now in this chapter, we will start examining the AWS services that some of these different data consumers may want to use, starting with those that need to use SQL to run ad hoc queries on data in the data lake.
SQL syntax is widely used for querying data in a variety of databases, and there is a large number of people that know SQL, making it a skill that is fairly easy to find. As a result, there is significant demand from various data consumers for the ability to query data that is in the data lake using SQL, without having to first move the data into a dedicated traditional database.
Amazon Athena is a serverless, fully managed service that lets you use SQL and Spark to directly query data in the data lake, as well as query various other database sources. It requires no setup, and there are options to either pay for the service based...