Summary
In this chapter, we went through why so many organizations prefer to build their data lakes on Amazon S3. We then went through different layers of data lakes in S3 and the purpose of each of them. Along with the layers of data, we also looked at how Glue Data Catalog helps to capture the metadata about the data in the form of tables. We also touched upon a new trend around having to build a transactional data lake, which involves selecting a table format that aligns closely with the specific use case being solved. Finally, we put it all together to solve a specific use case and saw it all come together, at least from the data storage and catalog side of things.
We have the data in S3 and we have the catalog of this data in Glue Data Catalog in the form of tables. The real value of this setup is that businesses can easily consume this data to derive insights from it. This leads us to the next section of this book around different purpose-built services and how each of them...