Advanced data discovery and data structuring with Athena
In this section, we will explore how Amazon Athena helps in performing advanced data discovery and data structuring phases of the data wrangling pipeline.
SQL-based data discovery with Athena
Amazon Athena provides SQL-based data exploration and supports advanced data types as well. The advantage here is that even people with no coding expertise can explore data with familiar SQL syntax on multiple formats of data and different storage types.
Before we query data using Amazon Athena, we need to create metadata for the table in the Glue catalog. Amazon Athena helps you to query data from different data sources, as follows:
- AWS Glue catalog—Access tables that are available in the AWS Glue catalog. We can create tables in the AWS Glue catalog in multiple ways, which we will explore later in this chapter.
- Federated data sources—Supports querying data from Apache Hive, AWS, and third-party databases...