Execution hierarchy
Let’s look at the execution flow of a Spark application with the help of the architecture depicted in Figure 3.1:
Figure 3.1: Spark architecture
These steps outline the flow from submitting a Spark job to freeing up resources when the job is completed:
- Spark executions start with a user submitting a
spark-submit
request to the Spark engine. This will create a Spark application. Once an action is performed, it will result in a job being created. - This request will initiate communication with the cluster manager. In turn, the cluster manager initializes the Spark driver to execute the
main()
method of the Spark application. To execute this method,SparkSession
is created. - The driver starts communicating with the cluster manager and asks for resources to start planning for execution.
- The cluster manager then starts the executors, which can communicate with the driver directly.
- The driver creates a logical...