The Sparksession
In computer science a session is a semi-permanent interactive information interchange between two communicating devices or between a computer and a user. SparkSession
is something similar, which provides a time bounded interaction between a user and the Spark framework and allows you to program with DataFrames and Datasets. We have used SparkContext
in the previous chapters while working with RDDs, but Spark Session should be your go-to starting point when starting to work with Data Frames or Datasets.
Creating a SparkSession
In Scala, Java, and Python you will use the Builder pattern to create a SparkSession
. It is important to understand that when you are using spark-shell
or pyspark
, Spark session will already be available as a spark object:
Figure 4.6: Spark session in Scala shell
The following image shows SparkSession in an Python shell:
Figure 4.7: SparkSession in Python shell
Example 4.1: Scala - Programmatically creating a Spark Session:
import org.apache.spark.sql...