Sizing up your executors
When you set up Spark, executors are run on the nodes in the cluster. To put it simply, executors are the processes where you:
- Run your compute
- Store your data
Each application has its own executor processes and they will stay up and running until your application is up and running. So by definition, they seem to be quite important from a performance perspective, and hence the three key metrics during a Spark deployment are:
- --num-executors: How many executors you need?
- --executor-cores: How many CPU cores would you want to allocate to each executor?
- --executor-memory: How much memory will you like to assign to each executor process?
So how do you allocate physical resources to Spark? While this may generally depend on the nature of the workload, you can vary between the following extreme parameters.
Figure 11.2: Executor granularity
Extreme approaches are generally a bad option except for very specific workload situations. For example, if you define very small sized executors...