Spark ETL and Lambda function code walk-through
You can download the complete code from our GitHub repository specified in the Technical requirements section of the chapter. In this section, we will highlight a few sections of the code to explain its purpose and usage.
Understanding the AWS Lambda function code
The Lambda function's primary objective is to invoke EMR cluster launch and then submit a Spark step.
The following part of the code creates a boto3
client for the EMR service and invokes the run_job_flow
method of it such that it takes all the required inputs for the cluster:
conn = boto3.client("emr", region_name=AWS_REGION) cluster_id = conn.run_job_flow(…)
The following parameters are passed to the run_job_flow
method that specifies the EMR cluster configurations:
Instances={ "Ec2KeyName": "<key-name>", "...