Integrating AWS Step Functions to orchestrate EMR jobs
AWS Step Functions supports createCluster
, createCluster.sync
, terminateCluster
, terminateCluster.sync
, addStep
, cancelStep
, setClusterTerminationProtection
, modifyInstanceFleetByName
, and modifyInstanceGroupByName
EMR actions, which provides a great flexibility to build workflows on top of EMR.
Let's assume that you would like to build a workflow that gets triggered as soon as a file arrives in S3 and the objective of the workflow is to execute a Spark + Hudi job to process the input file. The workflow is supposed to create a transient EMR cluster, submit a Spark job that does ETL transforms, and then, upon completion of the job, terminate the cluster. You can easily build this workflow using AWS Step Functions' createCluster
, addStep
, and terminateCluster
actions.
The following JSON definition is an example of a Step Functions' step that is of the Task
type and invokes the EMR createCluster
action with parameters...