Analyzing JSON data using Spark
JSON is one of most frequently used data storage and exchange formats in use these days. In this recipe, we are going to take a look at how to access the JSON file data from Spark and process it.
Getting ready
To perform this recipe, you should have Hadoop and Spark installed. You also need to install Scala. I am using Scala 2.11.0 here.
How to do it...
Spark supports the accessing of JSON files from the SQL context. You can read and write JSON files using the SQL context. In this recipe, we are going to take a look at how to read a JSON file from HDFS and process it.
First of all, download the people.json
sample JSON file and store it in the /json
HDFS path using this link:
https://github.com/deshpandetanmay/hadoop-real-world-cookbook/blob/master/data/people.json.
We will create a Scala project using the following files:
SparkJSON\build.sbt SparkJSON\project\assembly.sbt SparkJSON\src\main\scala\com\demo\SparkJSON.scala
Here is the content of build.sbt
:
name :=...