Validating output using Amazon Athena
The Parquet format data is already available in Amazon S3 partition columns, but to make it more consumable for data analysts or data scientists, it would be great if we can enable querying the data through SQL by making it available as a database table.
To make that integration, we will follow a two-step approach:
- First, we will run Glue Crawler to create a Glue Catalog table on top of the S3 data.
- Then, we will run a query in Athena to validate the output.
Let's see how you can integrate that.
Defining a virtual Glue Catalog table on top of Amazon S3 data
You can follow these steps to create and run Glue Crawler, which will create a Glue Data Catalog table:
- Navigate to AWS Glue Crawler at https://console.aws.amazon.com/glue/home?region=us-east-1#catalog:tab=crawlers.
- Then, click Add crawler, which will open a form to configure the crawler.
- Configure the crawler, where the data source should...