Working with Key Arrow Specifications
Utilities to perform analytics and computations are only useful if you have data to perform them on. That data can live in many different places and formats, both local and remote to the machine being used to analyze it. The Arrow libraries provide a bunch of functionalities that we’ll cover for reading data from and interacting with multiple different formats in multiple different locations. Now that you have a solid understanding of what Arrow is and how to manipulate arrays, in this chapter, you will learn how to get data into the Arrow format and communicate it between different processes.
In this chapter, we’re going to cover the following topics:
- Importing data from multiple formats, including CSV, Apache Parquet, and pandas and Polars DataFrames
- Interactions between Arrow, pandas data, and Polars data
- Utilizing shared memory for near-zero-cost data sharing