Summary
In this chapter, we learned the ins and outs of Athena Query Federation, including the differences between a federated query and a "classic data lake query." Then, our journey took us deeper into performance, availability, and the consistency tradeoffs of querying live data via a federated query or a snapshot that's been loaded into S3. We looked at the structure of the Athena Federation SDK and how it relies on Apache Arrow as a memory-compatible columnar format for exchanging data between analytics systems, without the need for multiple performance-robbing serialization steps.
Next, we stepped out of the academic realm and into the thick of things with a hands-on exercise in deploying and querying one of Athena's pre-built Connectors. Our efforts concluded with our most ambitious coding exercise yet, where we built a custom Athena Connector from the ground up using the Athena Query Federation SDK directly. In the next chapter, Chapter 13, Athena UDFs...