Summary
The goal of this chapter was to explain what Apache Arrow is, get you acquainted with the format, and have you use it in some simple use cases. This knowledge forms the baseline of everything else for us to talk about in the rest of the book!
Just as a reminder, you can check the GitHub repository (https://github.com/PacktPublishing/In-Memory-Analytics-with-Apache-Arrow-) for the solutions to the exercises presented here and for the full code samples to make sure you understand the concepts!
The previous examples and exercises are all fairly trivial and are meant to help reinforce the concepts introduced about the format and the specification while helping you get familiar with using Arrow in code.
In Chapter 2, Working with Key Arrow Specifications, we will introduce how to read your data into the Arrow format, whether it's on your local disk, Hadoop Distributed File System (HDFS), S3, or elsewhere, and integrate Arrow into some of the various processes and utilities you might already use with your data, such as the pandas
integration. We will also discover how to pass your data around between services and processes while keeping it in the Arrow format for performance.
Ready? Onward and upward!