Summary
By composing these various pieces together (the C Data API, Compute API, Datasets API, and Acero), and gluing infrastructure on top, anyone should be able to create a rudimentary query and analysis engine that is fairly performant right away. The functionality provided allows for abstracting away a lot of the tedious work of interacting with different file formats and handling different location sources of data to provide a single interface that allows you to get right to work in building the specific logic you need. Once again, it’s the fact that all these things are built on top of Arrow as an underlying format, which is particularly efficient for these operations, that allows them to all be so easily interoperable. Not only that, but because of the standardization of Arrow, individual pieces of that stack could be swapped out and composed with other, pre-existing projects, such as DuckDB or Apache DataFusion, both of which can use the Arrow C Data API to communicate...