In Chapter 1, Introduction to Spark, we learnt that one of the advantages of Apache Spark over the MapReduce framework is interactive processing. Apache Spark achieves the same using Spark REPL.
Spark REPL or Spark shell, also known as Spark CLI, is a very useful tool for exploring the Spark programming. REPL is an acronym for Read-Evaluate-Print Loop. It is an interactive shell used by programmers to interact with a framework. Apache Spark also comes with REPL that beginners can use to understand the Spark programming model.
To launch the Spark REPL, we will execute the command that we executed in the previous section:
$SPARK_HOME/bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/11/01 16:38:43 WARN...