Hive integration
Spark is integrated really well with Hive, though it does not include much of its dependencies and expects them to be available in its classpath. The following steps explain how to integrate Spark with Hive:
- Place
hive-site.xml
,core-site.xml
, andhdfs-site.xml
files in theSPARK_HOME/conf
folder. - Instantiate
SparkSession
with Hive support and, ifhive-site.xml
is not configured, then the context automatically createsmetastore_db
in the current directory and creates awarehouse
directory configured byspark.sql.warehouse.dir
, which defaults to the directoryspark-warehouse
.
SparkSession sparkSession = SparkSession .builder() .master("local") .config("spark.sql.warehouse.dir","Path of Warehouse") .appName("DatasetOperations") .enableHiveSupport() .getOrCreate();
- Once we have created a
SparkSession
with Hive support enabled, we can proceed to use it with the added benefits of query support from Hive. One way to identify the difference between Hive query function support...