Using Databricks Workflows
Databricks Workflows is a fully managed cloud orchestration service available to all Databricks customers. It simplifies the creation of pipeline orchestration for the following types of tasks:
- Databricks notebooks
- Python Script/Wheel
- JAR
- Spark Submit
- Databricks SQL – dashboards, queries, alerts, or files
- Delta Live Table pipelines
dbt
We will focus on using a spark submit
task to run a Scala JAR. The first thing we have to do is create an assembly or fat jar
, which will include all the dependencies of our project in our JAR.
To do this, we will add the following code to our build.sbt
file:
assemblyJarName in assembly := "de-with-scala-assembly-1.0.jar" assemblyMergeStrategy in assembly := { case PathList("META-INF", _*) => MergeStrategy.discard case _ => MergeStrategy.first }
The first line is to specify the name of the .jar file
to be created. The next block will provide a...