Reading and writing Parquet files
The Parquet file format is an open source columnar file format that’s efficient for data storage and processing. This column-oriented format is suitable for analytics workloads and efficient compression. The Parquet file format is very common in big data analytics.
In this recipe, you will learn how to read and write Parquet files in both a DataFrame and LazyFrame.
Getting ready
Toward the end of the recipe, you’ll need the pyarrow
library. If you haven’t yet installed it, run the following command:
>>> pip install pyarrow
How to do it...
We’ll first cover reading a Parquet file:
- Read a Parquet file:
parquet_input_file_path = '../data/venture_funding_deals.parquet' df = pl.read_parquet( parquet_input_file_path, columns=['Company', 'Amount', 'Valuation', 'Industry'], row_index_name='row_cnt' ) df.head()
The preceding code...