Performance evaluation
There are numerous configuration parameters that can be set to optimize the execution of Spark jobs. The topic of tuning and the resolution of performance bottlenecks on Spark clusters deserves at the minimum, a dedicated chapter.
This section does not address Mesos-and Yarn-specific configurations as they are not related to machine learning and are beyond the scope of this book [7:11].
Tuning parameters
The performance of a Spark application depends greatly on the configuration parameters. Selecting the appropriate value for those configuration parameters in Spark can be overwhelming—there are more than 60 configuration parameters as of the last count. Fortunately, the majority of those parameters have relevant default values.
However, there are a few parameters that deserve your attention, including:
- Number of cores available to execute transformation and actions on RDDs:
config.cores.max
. - Memory available for the execution of the transformation and actions
spark...