You're reading from In-Memory Analytics with Apache Arrow Accelerate data analytics for efficient processing of flat and hierarchical data structures

Product type Paperback

Published in Sep 2024

Publisher Packt

ISBN-13 9781835461228

Length 406 pages

Edition 2nd Edition

Languages

Python

Tools

Apache arrow

Concepts

Data Engineering

Author (1):

Matthew Topol

View More author details

Table of Contents (18) Chapters

Preface

1. Part 1: Overview of What Arrow is, Its Capabilities, Benefits, and Goals

2. Chapter 1: Getting Started with Apache Arrow FREE CHAPTER

3. Chapter 2: Working with Key Arrow Specifications

4. Chapter 3: Format and Memory Handling

5. Part 2: Interoperability with Arrow: The Power of Open Standards

6. Chapter 4: Crossing the Language Barrier with the Arrow C Data API

7. Chapter 5: Acero: A Streaming Arrow Execution Engine

8. Chapter 6: Using the Arrow Datasets API

9. Chapter 7: Exploring Apache Arrow Flight RPC

10. Chapter 8: Understanding Arrow Database Connectivity (ADBC)

11. Chapter 9: Using Arrow with Machine Learning Workflows

12. Part 3: Real-World Examples, Use Cases, and Future Development

13. Chapter 10: Powered by Apache Arrow

14. Chapter 11: How to Leave Your Mark on Arrow

15. Chapter 12: Future Development and Plans

16. Index

Why subscribe?

17. Other Books You May Enjoy

Final words

This brings us to the end of this journey. I’ve tried to pack lots of useful information, tips, tricks, and diagrams into this book, but there’s also plenty of room for much more research and experimentation on your end! If you haven’t done so already, go back and try the various exercises I’ve proposed throughout. Explore new things with the Arrow datasets and compute APIs, and try using Arrow Flight and ADBC in your work.

Across the various chapters in this book, we’ve covered a lot of things:

The Arrow format specification
Using the various Arrow libraries to improve many aspects of analytical computation and data science
Inter-process communication and sharing memory
Using Apache Spark, pandas, and Jupyter in conjunction with Arrow
The differences between data storage formats and in-memory runtime formats
Passing data across the boundaries of programming languages without having to copy it
Using gRPC...