Summary
In this chapter, we discussed the fundamental concepts and importance of data preparation within a data integration workflow. We explored how we can prepare data in AWS Glue using both visual interfaces and source code.
We explored different features of AWS Glue DataBrew and saw how we can implement profile jobs to profile the data and gather insights about the dataset being processed, as well as how to use a DQ Ruleset to enrich the data profile, use PII detection and redaction, and perform column encryption using deterministic and probabilistic encryption. We also discussed how we can apply transformations, build a recipe using those transformations, create a job using that recipe, and run the job.
Then, we discussed source code-based ETL development using AWS Glue ETL jobs and the different features of AWS Glue Studio before exploring some of the popular transformations and extensions available in AWS Glue ETL. We saw how these transformations can be used in specific...