Loading data from Amazon EMR
Amazon Elastic Map Reduce (EMR) allows you to execute big data frameworks such as Apache Hadoop and Apache Spark on AWS managed infrastructure. Amazon EMR is used for both batch and near-real-time processing as part of an analytical data pipeline.
In this recipe, we will see how to leverage Amazon EMR to load data into the customer table on Amazon Redshift using the COPY
command.
Getting ready
To complete this recipe, you will need to do the following:
- Ensure you have access to the AWS Console.
- Deploy an Amazon Redshift cluster in AWS region eu-west-1.
- Create Amazon Redshift cluster master user credentials.
- Gain access to any SQL interface, such as a SQL client or the Amazon Redshift Query Editor.
- Deploy an Amazon EMR cluster in AWS region eu-west-1. Refer to https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-gs.html to set up an EMR cluster.
- Ensure you have open connectivity between the Amazon EMR cluster and...