Data mesh on an Amazon S3-based data lake
If you recall from our previous chapter on data governance, we used AWS Lake Formation (LF) as a tool to provide fine-grained access control to data that resides in the S3 data lake via the Glue Data Catalog. The same LF permissions mechanism can be leveraged to share data but in a cross-AWS account manner, which opens the doors to implementing a true data mesh architecture, where the data lake doesn’t have to be a central repository for the whole enterprise. Each LOB can establish its own data lake on S3 inside its own AWS account. Some LOB accounts will be data owners, meaning they will produce, store, and consume their data for analytics purposes, from their own data lake on S3. However, if another LOB needs access to some datasets that belong to a different LOB, instead of copying data around, both the producer and consumer LOBs can leverage LF’s cross-account sharing mechanism.
Let’s introduce the use case for implementing...