Building a SparkSession object
In the Scala and Python programs, you build a SparkSession
object with the following build pattern:
val sparkSession = new SparkSession.builder.master(master_path).appName("application name").config("optional configuration parameters").getOrCreate()
Tip
While you can hardcode all these values, it's better to read them from the environment with reasonable defaults. This approach provides maximum flexibility to run the code in a changing environment without having to recompile. Using local
as the default value for the master makes it easy to launch your application in a test environment locally. By carefully selecting the defaults, you can avoid having to overspecify this.
The spark-shell/pyspark
creates the SparkSession
object automatically and assigns to the spark
variable.
The SparkSession
object has the SparkContext
object, which you can access with spark.sparkContext
.
As we will see later, the SparkSession
object unifies more than...