Search icon CANCEL
Subscription
0
Cart icon
Cart
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Serverless Architectures with AWS

You're reading from  Serverless Architectures with AWS

Product type Book
Published in Dec 2018
Publisher Packt
ISBN-13 9781789805024
Pages 226 pages
Edition 1st Edition
Languages
Concepts
Author (1):
Mohit Gupta Mohit Gupta
Profile icon Mohit Gupta
Toc

Chapter 4: Serverless Amazon Athena and the AWS Glue Data Catalog


Solution for Activity 5: Building a AWS Glue catalog for a CSV-Formatted Dataset and Analyzing the Data Using AWS Athena

  1. Log in to your AWS account.

  2. Upload the data file total-business-inventories-to-sales-ratio.csv (provided with this book) into a S3 bucket. Make sure that the required permissions are in place:

    Figure 4.24: Uploading the data file

  3. Go to the AWS Glue service.

  4. Select Crawlers and click on Add Crawler.

  5. Provide the crawler name and click on Next.

  6. Provide the path of the S3 bucket, where the file was uploaded in step 2. Click on Next.

  7. Click on Next, as we don't want to add another data store.

  8. Choose an existing IAM role that was created in Exercise 11: Using AWS Glue to Build a Metadata Repository. Alternatively, you can create a new one. Click on Next.

  9. Let's keep it as Run on demand and click on Next.

  10. Either you can create a new database here or click on the dropdown to select an existing one. Click on Next.

  11. Review the settings and click on Finish. You have successfully created the crawler.

  12. Now, go ahead and run the crawler.

  13. Once the run of the crawler is completed, you will see a new table being created under the schema that you chose in step 10:

    Figure 4.25: The new table after the crawler run was completed

  14. Go to tables, and you should see the newly created table, inventory_sales_ratio. Note that the table name is derived from the bucket name.

  15. Go to the AWS Athena service. You should see a new table name under the database that was selected in step 10.

  16. Click on new query and write the following query to get the expected output:

    select  month(try(date_parse(observed_date, '%m/%d/%Y'))) a, count(*) from inventory_sales_ratio
    where observed_value < 1.25 group by month(try(date_parse(observed_date, '%m/%d/%Y')))
    order by a ;
  17. When the query gets executed, you should see the expected output:

    Figure 4.26: The output after the query has run

  18. Looking at the output, we have a total of 8 months since 1992 where the inventories to sales ratios was < 1.25. We also have the month level count as well.

We have successfully completed the activity.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime