Summary
In this chapter, we discussed all aspects of metadata management, such as Glue Data Catalog and how it stores metadata. We went over different methods of populating it both manually (such as with the AWS CLI or running DDL statements) and automatically (through crawlers and their schema discovery features). We also discussed metadata maintenance and how it can become an issue for large organizations. We went over different options to not just keep metadata up to date but also automate the process and decouple it from the logic of your ETL processes.
We talked about metadata versioning and how to roll back versions causing issues. We also discussed how Lake Formation can help with not just metadata rollbacks but also data ones, as well as the wide variety of features it offers. Finally, we talked about lineage and how Glue DataBrew can help you discover, analyze, and transform your datasets in a visual way.
With these concepts, you should be able to fully manage the metadata...