Interoperating with PyArrow
Apache Arrow serves as a language-independent columnar memory format. It encompasses a range of technologies that empower big data systems to efficiently store, process, and transfer data. The PyArrow library is the Python API of Apache Arrow. Polars uses Apache Arrow’s columnar format as its memory model. Just like pandas uses NumPy for its in-memory representation of data, Polars uses PyArrow (since pandas version 2.0, it has had an added functionality to use PyArrow as its in-memory format).
The interoperability between PyArrow and Polars is great because you can not only convert back and forth between Polars DataFrames and PyArrow datasets but also use PyArrow with other aspects of things such as reading and writing files, as you saw in Chapter 2, Reading and Writing Files.
In this recipe, we’ll look at how to convert back and forth between Polars DataFrames and PyArrow datasets, as well as directly reading from and writing to PyArrow...