Summary
In this chapter, we learned about Athena's data sources and their different components: the metastore, data, and connector. The metastore contains metadata that Athena uses to translate tables and databases into their physical locations and process them. We delved into the information stored within a table and its key components: schema, partition columns, location, serializer/deserializer and associated properties, and table statistics.
We compared the AWS Glue Data Catalog and Apache Hive metastores when data is stored on S3 and looked at other non-S3 data sources. We went through the different ways of registering datasets into a metastore and how AWS Glue Crawlers can make it quick and easy to discover data on S3. Lastly, we looked at the data lake architecture, the different stages of data that are typical in one, and how to transform data using Athena.
Now that we have looked at our metastores and how they relate to our data in S3, we'll look at how we...