Chapter 1: Getting Started with Apache Arrow
Regardless of whether you are a data scientist/engineer, a machine learning (ML) specialist, or a software engineer trying to build something to perform data analytics, you've probably heard or read about something called Apache Arrow and either looked for more information or wondered what it was. Hopefully, this book can serve as a springboard both in understanding what Apache Arrow is and isn't, and also as a reference book to be continuously utilized in order to supercharge your analytical capabilities.
For now, let's just start off by explaining what Apache Arrow is and what you will use it for. Following that, we will walk through the Arrow specifications, set up a development environment where you can play around with the Apache Arrow libraries, and walk through a few simple exercises to get a feel for how to use them.
In this chapter, we're going to cover the following topics:
- Understanding the Arrow format and specifications
- Why does Arrow use a columnar in-memory format?
- Learning the terminology and the physical memory layout
- Arrow format versioning and stability
- Setting up your shooting range