SQLContext and HiveContext
Prior to Spark 2.0, SparkContext
used to be the entry point for Spark applications, an SQLContext
and HiveContext
used to be the entry points to run Spark SQL. HiveContext
is the superset of SQLContext
. The SQLContext
needs to be created to run Spark SQL on the RDD.
The SQLContext
provides connectivity to various data sources. Data can be read from those data sources and Spark SQL can be executed to transform the data as per the requirement. It can be created using SparkContext
as follows:
JavaSparkContext javaSparkContext = new JavaSparkContext(conf); SQLContext sqlContext = new SQLContext(javaSparkContext);
The SQLContext
creates a wrapper over SparkContext
and provides SQL functionality and functions to work with structured data. It comes with the basic level of SQL functions.
The HiveContext
, being a superset of SQLContext
, provides a lot more functions. The HiveContext
lets you write queries using Hive QL Parser ,which means all of the Hive functions can be...