Parquet files
Apache Parquet is a common columnar format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model, and programming language. Parquet's design was based on Google's Dremel paper and is considered to be one of the best performing data formats in a number of scenarios. We'll not go into too much detail around Parquet, but if you are interested you might want to have a read at https://parquet.apache.org/. In order to show how Spark can work with Parquet files, we will write the CDR JSON file as a Parquet file, and then load it before doing some basic data manipulation.
Example: Scala - Reading/Writing Parquet Files
#Reading a JSON file as a DataFrame val callDetailsDF = spark.read.json("/home/spark/sampledata/json/cdrs.json") # Write the DataFrame out as a Parquet File callDetailsDF.write.parquet("../../home/spark/sampledata/cdrs.parquet") # Loading the Parquet File as a DataFrame val callDetailsParquetDF...