Indexing data with meta via Apache Spark
Using a simple map for ingesting data is not good for simple jobs. The best practice in Spark is to use the case
class so that you have fast serialization and you are to manage complex type checking. During indexing, providing custom IDs can be very handy. In this recipe, we will see how to cover these issues.
Getting ready
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.
You also need a working installation of Apache Spark.
How to do it...
To store data in Elasticsearch via Apache Spark, we will perform the following steps:
We need to start the Spark shell:
./bin/spark-shell
We will import the required classes:
import org.apache.spark.SparkContext import org.elasticsearch.spark.rdd.EsSpark
We will create a
case class Person
:case class Person(username:String, name:String, age:Int)
We create...