Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
In-Memory Analytics with Apache Arrow

You're reading from   In-Memory Analytics with Apache Arrow Perform fast and efficient data analytics on both flat and hierarchical structured data

Arrow left icon
Product type Paperback
Published in Jun 2022
Publisher Packt
ISBN-13 9781801071031
Length 392 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Matthew Topol Matthew Topol
Author Profile Icon Matthew Topol
Matthew Topol
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Section 1: Overview of What Arrow Is, its Capabilities, Benefits, and Goals
2. Chapter 1: Getting Started with Apache Arrow FREE CHAPTER 3. Chapter 2: Working with Key Arrow Specifications 4. Chapter 3: Data Science with Apache Arrow 5. Section 2: Interoperability with Arrow: pandas, Parquet, Flight, and Datasets
6. Chapter 4: Format and Memory Handling 7. Chapter 5: Crossing the Language Barrier with the Arrow C Data API 8. Chapter 6: Leveraging the Arrow Compute APIs 9. Chapter 7: Using the Arrow Datasets API 10. Chapter 8: Exploring Apache Arrow Flight RPC 11. Section 3: Real-World Examples, Use Cases, and Future Development
12. Chapter 9: Powered by Apache Arrow 13. Chapter 10: How to Leave Your Mark on Arrow 14. Chapter 11: Future Development and Plans 15. Other Books You May Enjoy

Would you download a library? Of course!

As mentioned before, the Arrow project contains a variety of libraries for multiple programming languages. These official libraries enable anyone to work with Arrow data without having to implement the Arrow format themselves, regardless of the platform and programming language they are utilizing. There are two primary types of libraries that exist so far: ones that are distinct implementations of the Arrow specification, and ones that are built on other implementations. As of the time of writing this book, there are currently implementations for Arrow in C++ [3], C# [4], Go [5], Java [6], JavaScript [7], Julia [8], and Rust [9], which are all distinct implementations.

On top of those, there are libraries for C (Glib) [10], MATLAB [11], Python [12], R [13], and Ruby[14], which are all built on top of the C++ library, which happens to have the most active development. As you might expect, the various implementations all have different stages as far as what features and aspects of the specification are implemented, and the documentation helpfully provides an implementation matrix showing what features are implemented in which libraries. The implementation matrix [15] is then updated as these aspects of the specification and features are implemented in a given library.

With so many different implementations, you might be concerned about interoperability between them. As a result, the various library versions are integration tested via automated continuous integration (CI) jobs in order to ensure this interoperability among them. Depending on the language and development, these libraries are tested on a very large variety of platforms, including but not limited to the following:

  • x86/x86-64
  • arm64
  • s390x (IBM Mainframes)
  • macOS
  • Windows 32 and 64 bit
  • Debian/Ubuntu/Red Hat/CentOS

These libraries are deployed with their various respective package managing methods to attempt to make it as easy as possible to acquire and download the libraries. As a result, there's been significant adoption of Arrow, whether you're a data scientist using pandas, numpy, or Dask, or you're performing calculations and analytics using Apache Spark or AirFlow. And, if you're looking to get the libraries so you can try them out for yourself, the Apache Software Foundation hosts various ways to download and acquire the libraries.

Some of the channels where the libraries are made available are as follows:

When developing something that will utilize the Arrow libraries, keep the terms that were mentioned a few pages ago in mind, as most of the libraries utilize similar terminology and naming for describing their Application Programming Interfaces (APIs).

You have been reading a chapter from
In-Memory Analytics with Apache Arrow
Published in: Jun 2022
Publisher: Packt
ISBN-13: 9781801071031
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime