Data Federation
In the previous chapter, we explored different use cases for sharing data, both internally and externally with the organization. Data sharing is a very critical aspect of any data platform, where data stored in an Amazon S3-based data lake and in an Amazon Redshift data warehouse is seamlessly shared, without the need to create duplicate copies. Every data platform has distinct components for data storage, as well as for data computations. In the data sharing model, we focused on sharing data between similar systems – for example, using Amazon Athena to share data stored in an S3 data lake and using Amazon Redshift to share data with other Redshift clusters.
Data doesn’t always get stored, processed, and shared within homogeneous systems. A lot of times, data is captured in heterogeneous systems and those systems may not even reside inside the AWS ecosystem. This brings us to the question, how do we seamlessly and transparently query datasets from a...