Debugging Spark applications
In this section, we will see how to debug Spark applications that are running locally (on Eclipse or IntelliJ), standalone or cluster mode in YARN or Mesos. However, before diving deeper, it is necessary to know about logging in the Spark application.
Logging with log4j with Spark recap
As stated earlier, Spark uses log4j for its own logging. If you configured Spark properly, Spark gets logged all the operation to the shell console. A sample snapshot of the file can be seen from the following figure:
Figure 16: A snap of the log4j.properties file
Set the default spark-shell log level to WARN. When running the spark-shell, the log level for this class is used to overwrite the root logger's log level so that the user can have different defaults for the shell and regular Spark apps. We also need to append JVM arguments when launching a job executed by an executor and managed by the driver. For this, you should edit the conf/spark-defaults.conf
. In short, the following...