What is Query Federation?
Simply put, Query Federation refers to the concept that a query engine such as Athena may enlist the help of multiple datastores, working together, to execute your query. These datastores are usually capable of more than file-level CRUD operations. Most will support row-level scan, filter, and project operations, with some handling full SQL. We've mentioned this concept earlier in this book, typically concerning ETL versus querying in place. Let's take a closer look at the practical difference between a federated query and what we'll call a classic query.
The following diagram shows an example of a tried and true S3 data lake. There are multiple datastores, namely DynamoDB, RDS Aurora, and a generic database, all feeding into S3. Then, Athena, or another query engine, with the aid of Glue Data Catalog, can access all our data. This is a classic query. You submitted the query to Athena, and Athena directly answered your query by reading the...