Performance tuning and best practices
In this section, we will discuss various strategies for optimizing the performance of our Spark jobs. We will also discuss a few best practices with respect to Spark and Spark SQL.
Performance tuning is very subjective and a wide open statement. The very first step in performance tuning is to answer the question, "Do we really need to performance tune our jobs?" Now before we answer this question, we need to consider the following aspects:
Are our jobs meeting SLAs specified by the business?
If yes, then no need for performance tuning.
What do we want to achieve and is it realistic?
For example, expecting all Spark jobs (irrespective of data size or computations performed) to be completed in milliseconds is unrealistic.
Once we answer and define the need for performance tuning, only then should we move ahead and think of the strategy and start identifying areas where we can performance tune our Spark jobs.
Though there is no standard guide for performance tuning...