Why does Arrow use a columnar in-memory format?
There is often a lot of debate surrounding whether a database should be row-oriented or column-oriented, but this primarily refers to the on-disk format of the underlying storage files. Arrow’s data format is different from most cases discussed so far since it uses a columnar organization of data structures in memory directly. If you’re not familiar with columnar as a term, let’s take a look at what it means. First, imagine the following table of data:
Figure 1.3 – Sample data table
Traditionally, if you were to read this table into memory, you’d likely have some structure to represent a row and then read the data in one row at a time – maybe something like struct { string archer; string location; int year }
. The result is that you have the memory grouped closely together for each row, which is great if you always want to read all the columns for every row or are...