Cataloging and ingesting data using AWS Glue
Data that is staged in Amazon S3 can be cataloged using the AWS Glue service. Cataloging the data allows metadata to be attached and the AWS Glue Data Catalog to be populated. This process enriches the raw data, which can be queried as tables using many of the AWS analytical services—such as Amazon Redshift, Amazon Elastic MapReduce (Amazon EMR), and so on—for analytical processing. It is easy to perform this data discovery using the AWS Glue crawlers that can create and update the metadata automatically.
In this recipe, we will enrich the data to catalog and enable ingestion into Amazon Redshift.Getting ready
To complete this recipe, you will need the following:
- An Amazon Redshift cluster deployed in the
eu-west-1
AWS region - Amazon Redshift cluster master user credentials
- An IAM user with access to Amazon Redshift, Amazon S3, and AWS Glue
- An IAM role attached to an Amazon Redshift cluster that can...