Working with Parquet
In this section, we will discuss and talk about various operations provided by Spark SQL for working with Parquet data formats with appropriate examples.
Parquet is one of popular columnar data storage format for storing the structured data. Parquet leverages the record shredding and assembly algorithm (http://tinyurl.com/p8kaawg) as described in the Dremel paper (http://research.google.com/pubs/pub36632.html). Parquet supports efficient compression and encoding schemes which is better than just simple flattening of structured tables. Refer to https://parquet.apache.org/ for more information on the Parquet data format.
The DataFrame API of Spark SQL provides convenience operations for writing and reading data in the Parquet format. We can persist Parquet tables as temporary tables within Spark SQL and perform all other operations provided by the DataFrame API for data manipulation or analysis.
Let's see the example for writing/reading Parquet data formats and then we will...