Using the Arrow C data interface
Back in Chapter 2, Working with Key Arrow Specifications, I mentioned the Arrow C data interfaces regarding the communication of data between Python and Spark processes. At that point, we didn’t go much into detail about the interface or what it looks like; now, we will.
Since the Arrow project is fast-moving and constantly evolving, it can sometimes be difficult for other projects to incorporate the Arrow libraries into their work. There’s also the case where there might be a lot of existing code that needs to be adapted to work with Arrow piecemeal, leading to you having to create or even re-implement adapters for interchanging data. To avoid redundant efforts across these situations, the Arrow project defines a small, stable set of C definitions that can be copied into a project to allow easily passing data across the boundaries of different languages and libraries. For languages and runtimes that aren’t C or C++, it should...