Building an ephemeral cluster using Dataproc and Cloud Composer
Another option to manage ephemeral clusters is using Cloud Composer. We learned about Airflow in the previous chapter to orchestrate BigQuery data loading. But as we’ve already learned, Airflow has many operators, and one of them is, of course, Dataproc.
You should use this approach compared to a workflow template if your jobs are complex, in terms of developing a pipeline that contains many branches, backfilling logic, and dependencies to other services, since workflow templates can’t handle these complexities.
For this exercise, if your Cloud Composer environment is no longer available, you don’t need to execute it. Simply go through the following example code. Once you’ve completed Chapter 4, Building Workflows for Batch Data Loading Using Cloud Composer, you’ll understand the complete concept.
In the following example exercise, we will use Airflow to create a Dataproc cluster...