Orchestrating our streaming process
We used Argo in our last batch process for orchestration in Chapter 12, so let’s use Databricks Workflows to show another way to schedule our process in a cloud environment. Please refer to Chapter 2 for using the Community Edition of Databricks.
Use the knowledge gained from previous chapters to build your Spark jar and deploy it to a location that Databricks can access. This can be a cloud storage account or your Databricks File System (DBFS).
Go to your Databricks workspace and look for Workflows in the left navigation:
Figure 13.3 – Databricks navigation
Here are the steps to create a new workflow to orchestrate your pipeline:
- In your Databricks left navigation, click on Workflows. The first step in our process will be to create a job to run our streaming process. Create a new workflow by clicking on the Create job button. Edit the task and select Spark Submit, then input your jar location...