Chapter 1: Your First Query
This chapter is all about introducing you to the serverless analytics experience offered by Amazon Athena. Data is one of the most valuable assets you and your company generate. In recent years, we have seen a revolution in data retention, where companies are capturing all manner of data that was once ignored. Everything from logs to clickstream data, to support tickets are now routinely kept for years. Interestingly, the data itself is not what is valuable. Instead, the insights that are buried in that mountain of data are what we are after. Certainly, increased awareness and retention have made the information we need to power our businesses, applications, and decisions more available but the explosion in data sizes has made the insights we seek less accessible. What could once fit nicely in a traditional RDBMS, such as Oracle, now requires a distributed filesystem such as HDFS and an accompanying Massively Parallel Processing (MPP) engine such as Spark to run even the most basic of queries in a timely fashion.
Enter Amazon Athena. Unlike traditional analytics engines, Amazon Athena is a fully managed offering. You will never have to set up any servers or tune cryptic settings to get your queries running. This allows you to focus on what is most important: using data to generating insights that drive your business. You can just focus on getting the most out of your data. This ease of use is precisely why this first chapter is all about getting hands-on and running your first query. Whether you are a seasoned analytics veteran or a newcomer to the space, this chapter will give you the knowledge you need to be running your first Athena query in less than 30 minutes. For now, we will simplify things to demonstrate why so many people choose Amazon Athena for their workloads. This will help establish your mental model for the deeper discussions, features, and examples of later sections.
In this chapter, we will cover the following topics:
- What is Amazon Athena?
- Obtaining and preparing sample data
- Running your first query