JSON is a simple, flexible, and compact format used extensively as a data-interchange format in web services. Spark's support for JSON is great. There is no need for defining the schema for the JSON data, as the schema is automatically inferred. In addition, Spark greatly simplifies the query syntax required to access fields in complex JSON data structures. We will present detailed examples of JSON data in Chapter 12, Spark SQL in Large-Scale Application Architectures.
The dataset for this example contains approximately 1.69 million Amazon reviews for the electronics category, and can be downloaded from: http://jmcauley.ucsd.edu/data/amazon/.
We can directly read a JSON dataset to create Spark SQL DataFrame. We will read in a sample set of order records from a JSON file:
scala>val reviewsDF = spark.read.json("file:///Users/aurobindosarkar...