Orchestrating our batch process
Now that we have our transformation written, it is time to build our pipeline. For this example, we will use our minikube
instance that was set up as part of Installing Argo workflows section in Chapter 10.
Our workflow will print a message to start with, followed by Bronze, Silver, and Gold layer transformations, and finally a pipeline completion message. One important thing to note here is all of these steps will run as separate containers. What that means is data written by the Bronze layer will not be automatically available for Silver. In order to share data among containers, we need to use persistent volumes. There are several ways to do it, but for our example we will use hostPath
, which is a type of PersistentVolumes
supported by minikube
. Please note that hostPath
does not refer to a directory or file on your local machine, but rather within the minikube
container. So we need to make the required datasets available so that Spark can find...