Spark REPL also known as CLI
In Chapter 1, Introduction to Spark, we learnt that one of the advantages of Apache Spark over the MapReduce framework is interactive processing. Apache Spark achieves the same using Spark REPL.
Spark REPL or Spark shell, also known as Spark CLI, is a very useful tool for exploring the Spark programming. REPL is an acronym for Read-Evaluate-Print Loop. It is an interactive shell used by programmers to interact with a framework. Apache Spark also comes with REPL that beginners can use to understand the Spark programming model.
To launch the Spark REPL, we will execute the command that we executed in the previous section:
$SPARK_HOME/bin/spark-shell Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). 16/11/01 16:38:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/11...