Working with Apache Spark
In this recipe, you will learn how to integrate Hive with Apache Spark. Apache Spark is an open source cluster computing framework. It is used as a replacement of the MapReduce framework.
Getting ready
In this topic, we will cover the use of Hive and Apache Spark. You must have Apache Spark installed on your system before going further in the topic.
- Once the Spark is installed, start the Spark master server by executing the following command:
./sbin/start-master.sh
- Check whether the Spark master server has been started or not by issuing the URL mentioned later on the web browser:
http://<ip_address>:<port_number>
- The exact URL is present at the following path:
/spark-1.6.0-bin-hadoop2.6/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node1.out
- The following screenshot shows the result of the URL:
- Once the master server is started, start the
slave
service by executing the following command:./sbin/start-slave.sh <master-spark-URL>
- Refresh the...