pandas firing Arrow
If you've done any data analysis in Python, you've likely at least heard of the pandas
library. It is an open source, BSD-licensed library for performing data analysis in Python and one of the most popular tools used by data scientists and engineers to do their jobs. Given the ubiquity of its use, it only makes sense that Arrow's Python library has integration for converting to and from pandas
DataFrames quickly and efficiently. This section is going to dive into the specifics and the gotchas for using Arrow with pandas
, and how you can speed up your workflows by using them together.
Before we start, though, make sure you've installed pandas
locally so that you can follow along. Of course, you also need to have pyarrow
installed, but you already did that in the previous chapter, right? Let's take a look:
- If you're using conda,
pandas
is part of the Anaconda (https://docs.continuum.io/anaconda/) distribution and can easily...