Packt+ | Advance your knowledge in tech

You're reading from Serverless Architectures with AWS

Product type Book

Published in Dec 2018

Publisher Packt

ISBN-13 9781789805024

Pages 226 pages

Edition 1st Edition

Languages

Concepts

Serverless

Author (1):

Mohit Gupta

Table of Contents (8) Chapters

Serverless Architectures with AWS

Preface

1. AWS, Lambda, and Serverless Applications

2. Working with the AWS Serverless Platform

3. Building and Deploying a Media Application

4. Serverless Amazon Athena and the AWS Glue Data Catalog

5. Real-Time Data Insights Using Amazon Kinesis

Appendix

Chapter 4: Serverless Amazon Athena and the AWS Glue Data Catalog

Solution for Activity 5: Building a AWS Glue catalog for a CSV-Formatted Dataset and Analyzing the Data Using AWS Athena

Log in to your AWS account.
Upload the data file total-business-inventories-to-sales-ratio.csv (provided with this book) into a S3 bucket. Make sure that the required permissions are in place:
Figure 4.24: Uploading the data file
Go to the AWS Glue service.
Select Crawlers and click on Add Crawler.
Provide the crawler name and click on Next.
Provide the path of the S3 bucket, where the file was uploaded in step 2. Click on Next.
Click on Next, as we don't want to add another data store.
Choose an existing IAM role that was created in Exercise 11: Using AWS Glue to Build a Metadata Repository. Alternatively, you can create a new one. Click on Next.
Let's keep it as Run on demand and click on Next.
Either you can create a new database here or click on the dropdown to select an existing one. Click on Next.
Review the settings and click on Finish. You have successfully created the crawler.
Now, go ahead and run the crawler.
Once the run of the crawler is completed, you will see a new table being created under the schema that you chose in step 10:
Figure 4.25: The new table after the crawler run was completed
Go to tables, and you should see the newly created table, inventory_sales_ratio. Note that the table name is derived from the bucket name.
Go to the AWS Athena service. You should see a new table name under the database that was selected in step 10.

Click on new query and write the following query to get the expected output:

select  month(try(date_parse(observed_date, '%m/%d/%Y'))) a, count(*) from inventory_sales_ratio
where observed_value < 1.25 group by month(try(date_parse(observed_date, '%m/%d/%Y')))
order by a ;

When the query gets executed, you should see the expected output:
Figure 4.26: The output after the query has run
Looking at the output, we have a total of 8 months since 1992 where the inventories to sales ratios was < 1.25. We also have the month level count as well.