Deploying the app programmatically
Unlike the Jupyter notebooks, when you use the spark-submit
command, you need to prepare the SparkSession
yourself and configure it so your application runs properly.
In this section, we will learn how to create and configure the SparkSession
as well as how to use modules external to Spark.
Note
If you have not created your free account with either Databricks or Microsoft (or any other provider of Spark) do not worry - we will be still using your local machine as this is easier to get us started. However, if you decide to take your application to the cloud it will literally only require changing the --master
parameter when you submit the job.
Configuring your SparkSession
The main difference between using Jupyter and submitting jobs programmatically is the fact that you have to create your Spark context (and Hive, if you plan to use HiveQL), whereas when running Spark with Jupyter the contexts are automatically started for you.
In this section, we will develop...