Acero: A Streaming Arrow Execution Engine
We’re almost halfway through this book and only now are we covering performing analytical computations directly with Arrow. Kinda strange, right? At this point, if you’ve been following along, you should have a solid understanding of all the concepts you’ll need to be able to benefit from the compute library.
The Arrow community has built an open source reference implementation of a computation and query engine built on the Arrow format named Acero. To this end, the Acero library exists to facilitate various high-performance implementations of functions that operate on Arrow-formatted data, along with building execution plans for streams of data. This might be to perform logical casting from one data type to another, or it might be for performing large computation and filter operations, and everything in between. Rather than consumers having to implement operations over and over, high-performance implementations can...