Would you download a library? Of course!
As mentioned before, the Arrow project contains a variety of libraries for multiple programming languages. These official libraries enable anyone to work with Arrow data without having to implement the Arrow format themselves, regardless of the platform and programming language they are utilizing. There are two primary types of libraries that exist so far: ones that are distinct implementations of the Arrow specification, and ones that are built on other implementations. As of the time of writing this book, there are currently implementations for Arrow in C++ [3], C# [4], Go [5], Java [6], JavaScript [7], Julia [8], and Rust [9], which are all distinct implementations.
On top of those, there are libraries for C (Glib) [10], MATLAB [11], Python [12], R [13], and Ruby[14], which are all built on top of the C++ library, which happens to have the most active development. As you might expect, the various implementations all have different stages as far as what features and aspects of the specification are implemented, and the documentation helpfully provides an implementation matrix showing what features are implemented in which libraries. The implementation matrix [15] is then updated as these aspects of the specification and features are implemented in a given library.
With so many different implementations, you might be concerned about interoperability between them. As a result, the various library versions are integration tested via automated continuous integration (CI) jobs in order to ensure this interoperability among them. Depending on the language and development, these libraries are tested on a very large variety of platforms, including but not limited to the following:
- x86/x86-64
- arm64
- s390x (IBM Mainframes)
- macOS
- Windows 32 and 64 bit
- Debian/Ubuntu/Red Hat/CentOS
These libraries are deployed with their various respective package managing methods to attempt to make it as easy as possible to acquire and download the libraries. As a result, there's been significant adoption of Arrow, whether you're a data scientist using pandas
, numpy
, or Dask
, or you're performing calculations and analytics using Apache Spark or AirFlow. And, if you're looking to get the libraries so you can try them out for yourself, the Apache Software Foundation hosts various ways to download and acquire the libraries.
Some of the channels where the libraries are made available are as follows:
- Conda (https://conda-forge.github.io/) for Linux, Windows, and macOS
- Homebrew (https://brew.sh/) for macOS
- MSYS2 for cross-platform Windows development
- vcpkg (https://github.com/Microsoft/vcpkg) for MSVC++
- R packages on CRAN (https://cran.r-project.org/)
- Julia packages in the general registry (https://github.com/JuliaRegistries/General)
- Ruby packages with RubyGems (https://rubygems.org/)
- C# packages with NuGet (https://www.nuget.org/packages/Apache.Arrow/)
- APT and Yum repositories for various Debian, Ubuntu, Red Hat, and CentOS distributions
- Java Artifacts on Maven Central
- Pip wheels for Python
When developing something that will utilize the Arrow libraries, keep the terms that were mentioned a few pages ago in mind, as most of the libraries utilize similar terminology and naming for describing their Application Programming Interfaces (APIs).