Introducing Hivemall for Spark
Apache Hive supports three execution engines—MapReduce, Tez, and Spark. Though Hivemall does not support Spark natively, the Hivemall for Spark project (https://github.com/maropu/hivemall-spark) implements a wrapper for Spark. This wrapper enables you to use Hivemall UDFs in SparkContext, DataFrames, or Spark Streaming. It is really easy to get started with Hivemall for Spark. Follow this procedure to start a Scala shell, load UDFs, and execute SQLs:
Download the
define-udfs
script:[cloudera@quickstart ~]$ wget https://raw.githubusercontent.com/maropu/hivemall-spark/master/scripts/ddl/define-udfs.sh --no-check-certificate
Start a Scala shell with the
packages
option:[cloudera@quickstart ~]$ spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master local[*] --packages maropu:hivemall-spark:0.0.6
Create Hivemall functions as follows. Hivemall for Spark does not support Python yet:
scala> :load define-udfs.sh
Now you can execute examples from: