Working with Parquet files
Apache Parquet is an open source file format that is designed for efficient storage and retrieval of data. Their columnar-oriented format combined with the use of compression to reduce storage space and I/O cost of reading and writing make these files well suited for storing and retrieving large amounts of structured and semi-structured data for analytical applications.
Parquet files are encoded in a binary format, so you cannot view them as text files as you might with a CSV file. Parquet files are self-describing in that each file contains both data and metadata describing the schema of the data within the file. This means that column names, their data types, and summary information about the number of rows and columns are encoded within the file. This contrasts with CSV and JSON files, which contain purely text data without an embedded schema. In addition to performance gains, this is one of the notable benefits of Parquet files, as their built-in schema...