Chapter 4. Creating a SparkSession Object
This chapter will cover how to create a SparkSession
object in your cluster. A SparkSession
object represents the connection to a Spark cluster (local or remote) and provides the entry point to interact with Spark. We need to create SparkSession
so that we can interact with Spark and distribute our jobs. In Chapter 2, Using the Spark Shell, we interacted with Spark through the Spark shell which helped us create a SparkSession
object and a SparkContext
object. Now you can create RDDs, broadcast variables, and counters, and actually do fun things with your data. The Spark shell serves as an example of how to interact with the Spark cluster through the SparkSession
and SparkContext
object.
For a client to establish a connection to the Spark cluster, the SparkSession
object needs some basic information, which is given here:
- Master URL: This URL can be
local[n]
for local mode,Spark://[sparkip]
for the Spark server, ormesos://path
for a...