Technical requirements
In this chapter, we will dive deep into the EMR cluster's integration with AWS Glue Data Catalog and the AWS Lake Formation service. So, to test the integration, you will need the following resources before you get started:
- An AWS account
- An Identity and Access Management (IAM) user that has permission to create and manage an EMR cluster with related resources, such as Amazon Elastic Compute Cloud (EC2) instances, required IAM roles, and security groups
- IAM access privileges to integrate AWS Glue Data Catalog, AWS Lake Formation, Amazon Simple Storage Service (S3), CloudWatch, and CloudTrail
Now let's understand how you can build a centralized data catalog in EMR and what options you have for this integration.