Lifecycle of Spark program
The following steps explain the lifecycle of a Spark application with standalone resource manager, and Figure 3.8 shows the scheduling process of a spark program:
- The user submits a spark application using the
spark-submit
command. - Spark-submit launches the driver program on the same node in (client mode) or on the cluster (cluster mode) and invokes the main method specified by the user.
- The driver program contacts the cluster manager to ask for resources to launch executor JVMs based on the configuration parameters supplied.
- The cluster manager launches executor JVMs on worker nodes.
- The driver process scans through the user application. Based on the RDD actions and transformations in the program, Spark creates an operator graph.
- When an action (such as collect) is called, the graph is submitted to a DAG scheduler. The DAG scheduler divides the operator graph into stages.
- A stage comprises tasks based on partitions of the input data. The DAG scheduler pipelines operators...