Querying S3 data using Athena
Athena is a serverless service designed for querying data stored in S3. It is serverless because the client doesn’t manage the servers that are used for computation:
- Athena uses a schema to present the results against a query on the data stored in S3. You define how (the way or the structure) you want your data to appear in the form of a schema and Athena reads the raw data from S3 to show the results as per the defined schema.
- The output can be used by other services for visualization, storage, or various analytics purposes. The source data in S3 can be in any of the following structured, semi-structured, or unstructured data formats: XML, JSON, CSV/TSV, AVRO, Parquet, or ORC (as well as others). CloudTrail, ELB logs, and VPC flow logs can also be stored in S3 and analyzed by Athena.
- This follows the schema-on-read technique. Unlike traditional techniques, tables are defined in advance in a data catalog, and the data’s structure...