Working with Hive tables
In this section, we will discuss the integration of Spark SQL with Hive tables. We will see the process of executing the Hive queries in Spark SQL, which will help us in creating and analyzing Hive tables in HDFS.
Spark SQL provides the flexibility of directly executing Hive queries with our Spark SQL codebase. The best part is that the Hive queries are executed on the Spark cluster and we just require the setup of HDFS for reading and storing the Hive tables. In other words, there is no need to set up a complete Hadoop cluster with services like ResourceManager or NodeManager. We just need services of HDFS, which are available as soon as we start NameNode and DataNode.
Perform the following steps for creating Hive tables for our Chicago crime data and at the same time also execute some analytical Hive queries:
Open and edit the
Spark-Examples
project and create a Scala object namedchapter.eight.ScalaSparkSQLToHive.scala
.Next, edit
chapter.eight.ScalaSparkSQLToHive...