The spark-submit command
The entry point for submitting jobs to Spark (be it locally or on a cluster) is the spark-submit
script. The script, however, allows you not only to submit the jobs (although that is its main purpose), but also kill jobs or check their status.
Note
Under the hood, the spark-submit
command passes the call to the spark-class
script that, in turn, starts a launcher Java application. For those interested, you can check the GitHub repository for Spark: https://github.com/apache/spark/blob/master/bin/sparksubmitt.
The spark-submit
command provides a unified API for deploying apps on a variety of Spark supported cluster managers (such as Mesos or Yarn), thus relieving you from configuring your application for each of them separately.
On the general level, the syntax looks as follows:
spark-submit [options] <python file> [app arguments]
We will go through the list of all the options soon. The app arguments
are the parameters you want to pass to your application.
Note
You...