Putting it all together
So far, we have discussed the different storage layers in a typical data lake in S3 and defined the purpose of each of the layers. We also introduced the concept of creating metadata using a Glue crawler and storing it in Glue Data Catalog. Finally, we looked at use cases for building transactional data lakes. This is a good time to pivot back to the GreatFin business requirements we introduced earlier and apply these data lake foundational concepts to our use case.
Marketing use case
Suppose the marketing department at GreatFin wants to find certain top leads for offering a new type of certificate of deposit (CD) with a higher interest rate to select a few high-net-worth customers only. In this case, the customer data will be stored in multiple systems, from different LOBs.
Let’s walk through what each layer in the data lake might look like.
Raw layer example
The following diagram is a depiction of data stored in a raw layer bucket in...