Learning the terminology and physical memory layout
As mentioned previously, the Arrow columnar format specification includes definitions of the in-memory data structures, metadata serialization, and protocols for data transportation. The format itself has a few key promises:
- Data adjacency for sequential access
O(1)
(constant time) random access- SIMD and vectorization-friendly
- Relocatable, allowing for zero-copy access in shared memory
To ensure we’re all on the same page, here’s a quick glossary of terms that are used throughout the format specification and the rest of this book:
- Array: A list of values with a known length of the same type.
- Slot: The value in an array identified by a specific index.
- Buffer/contiguous memory region: A single contiguous block of memory with a given length.
- Physical layout: The underlying memory layout for an array without accounting for the interpretation of the logical value. For example...