Technical requirements
As before, this chapter has a lot of code examples and exercises to drive home an understanding of using these libraries. You’ll need an internet-connected computer with the following to try out the examples and follow along:
- Python 3.8+: With the
pyarrow
module installed and thedataset
submodule - A C++ compiler supporting C++17 or higher: With the Arrow libraries installed and able to be included and linked against
- Your preferred coding IDE, such as Emacs, Vim, Sublime, or VS Code
- As before, you can find the full sample code in the accompanying GitHub repository at https://github.com/PacktPublishing/In-Memory-Analytics-with-Apache-Arrow-Second-Edition
- We’re also going to utilize the NYC taxi dataset located in the public AWS S3 bucket at
s3://ursa-labs-taxi-data/