We wanted to add this section about importing and saving data here, even though it is not purely about Spark SQL, so that concepts such as Parquet and JSON file formats could be introduced. This section also allows us to cover how to access saved data in loose text as well as the CSV, Parquet, and JSON formats conveniently in one place.
Importing and saving data
Processing the text files
Using SparkContext, it is possible to load a text file in RDD using the textFile method. Additionally, the wholeTextFile method can read the contents of a directory to RDD. The following examples show you how a file, based on the local filesystem (file://) or HDFS (hdfs://), can be read to a Spark RDD. These examples show you that the data will...