Analyzing data with Amazon Athena in Python
Amazon Athena is a query service from AWS to query data stored in Amazon S3 using SQL syntax. In the previous recipe, we ran a couple of SQL queries using the Athena console (UI). Of course, when working with machine learning and machine learning engineering tasks, we want this performed through a script so that we have the opportunity to automate certain steps of the process.
In this recipe, we will use the boto3 Python SDK to programmatically run Amazon Athena SQL queries. Once we have completed this recipe, we will have the JSON data stored in our S3 bucket loaded, queried, and transformed into a tabular format using Amazon Athena using Python. We will perform two queries in this recipe—a simple SELECT
query and a query that invokes a deployed machine learning model in Amazon SageMaker.
Getting ready
This recipe continues from Invoking machine learning models with Amazon Athena using SQL queries.