Working with database-style data with pandas
pandas is a library that was originally developed by Wes McKinney. It was designed to analyze datasets in a seamless and performant way. In recent years, this powerful library has seen incredible growth and a huge adoption by the Python community. In this section, we will introduce the main concepts and tools provided in this library, and we will use them to increase the performance of various use cases that can't otherwise be addressed with NumPy's vectorized operations and broadcasting.
pandas fundamentals
While NumPy deals mostly with arrays, pandas's main data structures are pandas.Series
, pandas.DataFrame
, and pandas.Panel
. In the rest of this chapter, we will abbreviate pandas
to pd
.
The main difference between a pd.Series
object and an np.array
is that a pd.Series
object associates a specific key with each element of an array. Let's see how this works in practice with an example.
Let's assume that...