Integrating Airflow to trigger EMR jobs
Airflow provides the following API functions to interact with the Amazon EMR cluster:
EmrCreateJobFlowOperator
: This method enables you to create an EMR cluster.EmrJobFlowSensor
: This helps to check the status of the EMR cluster.EmrAddStepsOperator
: With this, you can add a step to the EMR cluster.EmrStepSensor
: This helps to check the status of an existing step in your EMR cluster.EmrModifyClusterOperator
: This is used to modify an existing cluster.EmrTerminateJobFlowOperator
: This enables you to terminate an existing cluster.
As explained, you can design a workflow in Airflows using the Python programming language, where you can define each action and then define the sequence of execution. The following is sample Python code that executes the EmrCreateJobFlowOperator
method of Airflow that triggers an EMR create cluster action:
cluster_create_action = EmrCreateJobFlowOperator( ...