Automating data preparation and analysis with AWS Glue DataBrew
AWS Glue DataBrew is a no-code data preparation service built to help data scientists and ML engineers clean, prepare, and transform data. Similar to the services we used in Chapter 4, Serverless Data Management on AWS, Glue DataBrew is serverless as well. This means that we won’t need to worry about infrastructure management when using this service to perform data preparation, transformation, and analysis.
Figure 5.2 – The core concepts in AWS Glue DataBrew
In Figure 5.2, we can see that there are different concepts and resources involved when using AWS Glue DataBrew. We need to have a good idea of what these are before using the service. Here is a quick overview of the concepts and terms used:
- Dataset – Data stored in an existing data source (for example, Amazon S3, Amazon Redshift, or Amazon RDS) or uploaded from the local machine to an S3 bucket.
- Recipe –...