Reading data from Parquet files
Parquet files have emerged as a popular choice for storing and processing large datasets efficiently in the world of data engineering and big data analytics. Initially developed by Twitter and Cloudera, Parquet was later contributed to the Apache Foundation as an open-source columnar file format. The focus of Parquet is to prioritize fast data retrieval and efficient compression. Its design specifically caters to analytical workloads and serves as an excellent option for partitioning data, which you will explore in this recipe and again in Chapter 4, Persisting Time Series Data to Files. As a result, Parquet has become the de facto standard for modern data architectures and cloud storage solutions.
In this recipe you learn how to read parquet files using pandas and learn how to query a specific partition for efficient data retrieval.
Getting ready
You will be reading parquet files that contain weather data from National Oceanic And Atmospheric Administration...